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RUNNING MEMORY SPAN! 


IRWIN POLLACK, LAWRENCE B. JOHNSON, anp P. ROBERT KNAFF 
Operational Applications Laboratory, Air Force Cambridge Research Center 


To an organism repeatedly bom- 
barded with information, the dropping 
out of items of information (for- 
getting) is as important as the acquisi- 
tion and storage of new items of 
information. Indeed, in many prac- 
tical situations, as in the monitoring 
of a display, the operator is required 
only to report information of the 
recent past history of the display. 
A characteristic feature of many 
monitoring tasks is, apparently, that 
the operator is uncertain when he will 
be interrogated about the display. 

In the present study, an estimate 
is obtained of the number of items 
that can be retained when S is un- 
certain of the point of message 
interrogation. This task is also com- 
pared with the typical message-recall 
task in which S has knowledge of the 
length of the to-be-presented message. 

The monitoring task with uncertain 
length is called rumning memory task.* 


1 This is Technical Report AFCRC TR-58- 
2, AD 152563, of the Air Force Cambridge 
Research Center. This research supports 
Project 7682 of the Air Research and Develop- 
ment Program in Human Engineering. 

? Concurrent studies of the running mem- 
ory span have been completed at the Lincoln 
Laboratories under the direction of Nancy 
Waugh, and at the International Business 
Machines Research Center under the direc- 
tion of Nancy Anderson. 


This term is employed to suggest that 
S must drop out old items and capture 
new items successively over a tem- 
poral course to perform his task. 
And, since the reproduction of digits 
is involved in the present studies, 
performance in the running memory 
task will be referred to as the running 
digit span (RDS). It may be noted 
that our work has been anticipated 
by Oberly (1928) who compared 
Binet-type memory span of messages 
presented in haphazard order with 
messages presented in serial order. 
This study showed little difference 
in the recall scores under the two 
conditions. 

Four experiments will be reported: 
the first was a pilot examination with 
a large number of university students 
in order to determine the range of 
individual differences and the effect 
of two experimental variables on the 
RDS with untrained Ss; and the 
next three were studies with a small 
number of Ss in order to determine 
the interactions among several vari- 
ables as they determine the RDS. 


EXPERIMENT I: NORMATIVE 
STUDY 
Method 


This experiment was performed by one of 
the authors (PRK) at the University of 
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Maryland. Sixty-eight students, in three 
separate groups, listened to a single recorded 
tape. Upon this tape were the instructions 
and 16 spoken messages. The messages were 
composed of randomly selected decimal digits. 
The order of presentation was two messages 
at each of the following rates: 4, 2, 1, and .5 
digits per second; and a counterbalanced 
repetition of these rates. Message lengths 
of 25, 30, 35, and 40 digits were presented in 
an approximately counterbalanced order. 
The Ss were not informed of the range of 
message lengths used. 

The tape-recorded instructions were: ‘You 
are about to take part in a study of what is 
most easily termed ‘Backward memory.’ A 
group of numbers will be read and, after 
each group is completed, you will write as 
many of the /ast numbers in the group as you 
can remember. You won't know how long 
each group will be, but you will know when 
the group has ended. A tone, like this 
[tone sounded], will sound at the end of each 
group. After you have head the tone, start 
writing as many of the last numbers in the 
group as you can recall. It may be easier 
to start with the last number in the group 
and work backwards, or it may be easier to 
fill in the rows in a forward direction, Just 
make sure that the number you write under 
the column marked ‘Last Digit’ corresponds 
to the last number of the group you just heard, 
and that the number you write under the 
column marked ‘Next-to-last Digit’ cor- 
responds to the next-to-last number of the 
group, and so on.” 

Before starting the actual tests, Ss were 
informed that four rates would be employed, 
and were then given an example of each rate. 


Results 


The average number of digits, 
correctly identified and correctly posi- 
tioned from the end of the message, 
is presented on the abscissa in Fig. 1. 
The ordinate of Fig. 1 presents the 
curnulative distribution of the average 
scores of the individual Ss, averaged 
over the variables of rate of message 


presentation and message length, 
upon a normal probability _ plot. 
Within the 15%-98% range, the 


normal probability plot is an adequate 
description of the distribution. The 
median span is 4.1 digits; the mean 
span is 4.2 digits, and the SD is 
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Fic. 1. The distribution of average audi- 
tory memory spans among 68 university 
students. The abscissa is the running 
memory span; the ordinate is the cumulative 
percentage of cases scaled upon a probability 
grid. Each score is based on the average of 
16 messages. 


.75 digit. Redefinition of the RDS 
as the number of successive correctly 
identified digits from the end of the 
message decreases the mean span by 
less than 5%. 

With these university students, 
a score of about 7 digits is not un- 
reasonable in the known-length recall 
experiment. Thus, the results sug- 
gest that, at least with inexperienced 
Ss, there is a substantial difference 
between the running-length and 
known-length digit spans. 

Separate analyses of variance were 
performed for message length vs. Ss 
and for rate of digit presentation vs. 
Ss. For each analysis, the results 
were pooled over the other variable. 
Each of the two main variables were 
found to be statistically significant. 
The ¢ test for mean differences showed 
a significantly lower span at the high- 
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est rate of digit presentation and a 
significantly higher span at the short- 
est message length. 


EXPERIMENT II: LONGITUDINAL 
EXAMINATION OF RDS 


Method 


A small group of Ss was examined over 
a longer testing period to determine if the 
RDS would approach the known-length digit 
span after considerable practice. The crew 
was tested for 5 hr. a day (with frequent 
rest pauses), for two days a week, for a total 
period of 13 weeks. 

The instructions were essentially identical 
with Exp. I. In addition, Ss were told that 
all messages were between 4 and 40 decimal 
digits in length. 

In half of the tests, before the presentation 
of each message, Ss were informed of the 
length of the to-be-recalled message. In 
the other half, Ss were not so informed. We 
shall use terms certain length and uncertain 
length to distinguish between these two 
instructional conditions. In the course of 
the entire experiment, a given message was 
repeated four times: two presentations with 
certain-length messages and two presenta- 
tions with uncertain-length messages. Al- 
together, there were five Ss, 15 message 
lengths, two length certainties, and four 
rates of digit presentation. The message 
variables were counterbalanced within eight 
replications of the experiment. In addition, 
a rate of digit presentation of one digit every 
4 sec. was sampled during two of the replica- 
tions of the experiment. 

The main procedural difference from 
Exp. I was that each S corrected his own 
paper after reproduction of each message 
(E read the corrected version from the end 
of the message) ; further, the rate of presenta- 
tion was constant within each group of 15 
messages. The recorded talker was that of 
Exp. I, except for the tests carried out at one 
digit every 4 sec. 


Results 


The average digit span is presented 
in Fig. 2 as a function of the length 
of message. The upper curve is 
associated with messages of known 
or certain length (C); the lower curve 
with messages of uncertain length 
(U). Scores represented in Fig. 
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Fic. 2. Auditory memory span as a 
function of message length. The abscissa is 
the length of the presented message. The 
ordinate is the digital span in terms of the 
number of digits successively correct from 
the end of the message. The curve marked 
(C) is associated with messages of known 
or certain length; the curves marked (U) 
is associated with messages of unknown or 
uncertain length. Each point is based upon 
160 messages (five subjects, eight blocks, 
four rates of presentation). The points 
associated with message length of 4-7 digits 
fall on the smooth curves. 


2-8 are the average number of suc- 
cessive correctly identified units from 
the end of the message. The results 
for messages of length 4-7 digits have 
not been plotted in Fig. 2. They 
fall directly upon the smooth curve 
for messages of certain length and 
uncertain length. 

In general, there is a consistent 
difference in favor of messages of 
known length. However, for shorter 
messages, the spans are effectively 
limited by the length of the message. 
The remainder of the analysis will, 
therefore, be confined to messages of 
15 units and longer. For these mes- 
sages, the average difference between 
the two spans, running and certain 
length, is 1.4 digits. 

Individual differences among the 
five Ss may be noted. For the least 
proficient S, the average auditory 
digit span (averaged over all vari- 





140 


TABLE 1 


ANALYSIS OF VARIANCE OF AUDITORY 
Dicits SPANS 























Note.—This =~ 
of 15-40 digits. 


of the same s of Coy cg 
first-order variables were tested against the first-order 
residual variance. If significance for each ae: were 
tested against the highest tweerr variance of the 
next order, e.g., B vs. B X R, all of the variables still 
achieve signi “ance, although the significance level of 
one variable changes from the .01 level to a somewhat 
higher level. The residual error variance does not 
include the insignificant factors. When the insignifi- 
cant variables were pooled within residual error vari- 
ance, the significance level of the factors remained 
unchanged. In view of the high heterogeneity of 
variances (see Fig. 5), only those variables meeting the 
01 level of significance should be interpreted as 
significant. 
*P < 05 


~P < 01. 


ables, but restricted to messages of 
15 units and longer) was 5.6 digits 
for uncertain-length messages, and 
6.4 digits for certain-length messages. 
The corresponding average spans ob- 
tained by the most proficient S were 
7.3 and 8.4 digits, respectively. In 
general, Ss scoring high with uncer- 
tain-length messages scored high with 
certain-length messages (r = .99). 
An analysis of variance was per- 
formed upon messages of 15 units 
and longer, with results pooled over 
Ss. A summary of the analysis is 
presented in Table 1. It may be 
noted that all first-order variables, 
i.e., length of message (L), rate of 
presentation (R), trial-block (B), 
and message-length certainty (C) 
were highly significant. A trial-block 


I. POLLACK, L. B. JOHNSON, AND P. R. KNAFF 


in Table 1 is defined as the presenta- 
tion of a single message at each of 15 
lengths at each of 4 rates of presenta- 
tion under conditionsof certain and un- 
certain length—a total of 120 messages 
for each of five Ss. Furthermore, all 
significant interactions including mes- 
sage-length certainty (C) also included 
rate of presentation (R). The vari- 
ance unaccounted for by significant 
effects amounts to only 1.24% of the 
total variance. The over-all average 
estimated standard error of a single 
measurement (the average span yielded 
by a single message presented to the 
testing crew of 5 Ss) was 1.0 digit. 

The role of rate-of-digit presenta- 
tion is shown in Fig. 3. The abscissa 
is the rate-of-digit presentation, in 
digits per second; the ordinate is 
the digit span. The upper curve is 
associated with messages of certain 
length; the lower curve, with mes- 
sages of uncertain length. The thin 
straight lines represent the linear 
functions fitted by the method of 
least squares. 

The main point of Fig. 3 is that 
the large differences between the 
two digit spans observed at low rates 
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Fic. 3. Auditory memory span as a func- 
tion of the rate of message presentation in 
digits per second. Each point is based upon 
240 messages with results averaged over 
other variables. 
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of digit presentation are sharply 
narrowed at rapid rates of digit 
presentation. The additional tests 
carried out at one digit every 4 sec. 
demonstrate still additional improve- 
ments in the digit spans. at rates 
slower than one digit every 2 sec. 
For example, at comparable periods 
of the experiment, uncertain- and 
certain-length spans of 12.5 and 14.4 
digits were observed at a presentation 
rate of .25 digit per sec.; at compar- 
able periods of the experiment the 
corresponding spans at a rate of .5 
digit per sec. were 9.3 and 11.2 digits 
per sec., respectively. 

The interaction between practice, 
rate, and length certainty is presented 
in Fig. 4. Figure 4 presents the 
auditory digit span for both certain- 
length (C) and uncertain-length (U) 
messages as a function of practice 
for the four rates of presentation. 
From left to right, the sections of the 
graph are associated with successively 
slower rates of presentation. 

The main point of Fig. 4 is that the 
difference between the certain- and 
uncertain-length spans found early 
in practice are maintained over an 
extended period of practice. In fact, 
if the four sections of Fig. 4 are 
combined, the slope of the certain- 
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Fic. 4. Auditory memory span as a joint 
function of practice and rate of message 
presentation. Each section of the figure is 
associated with a single rate of presentation. 
Each point is based upon 30 messages with 
results averaged over other variables. 
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length function (C) is about 10% 
greater than that of the uncertain- 
length function (U). The lower span 
for messages of uncertain length is, 
therefore, probably not an artifact 
of the extent of practice with known- 
length memory tasks. 

Because of the large changes in 
performance observed with practice, 
and because each message was pre- 
sented four times in the course of the 
experiment, it became necessary to 
test whether learning was specific to 
the particular messages employed. 
Control tests employing a_ newly 
constructed set of messages and the 
old set of messages were introduced 
at the end of the experiments. Scores 
on the new and old messages were 
nearly identical. The average spans 
for messages pooled over Ss, message 
length, and length certainty were 
9.6 digits for the previously employed 
messages and 9.5 digits for new control 
messages. The differences between 
spans do not approach significance. 
It is concluded that the improvement 
with practice was not due to learning 
of the particular messages employed. 

The second point of Fig. 4 is that 
for both the certain- and uncertain- 
length tasks, the large changes ob- 
served with practice are principally 
associated with the slower, rather than 
the more rapid, rates of presentation. 

Figure 5 presents the variability 
of digit-span scores as a function of 
the rate-of-digit presentation. The 
abscissa is the rate of presentation, 
in digits per second. The ordinate 
is the average variance score between 
(inter) and within (intra) Ss. The 
parameter on the curves is length 
certainty. 

Three features are noted. The 
average variance consistently de- 
creases as the rate of digit presenta- 
tion is increased, with the largest 
changes occurring between .5 and 1 
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Fic. 5. Average inter- and intra-S vari- 
ance as a function of rate of presentation 
for two conditions of certainty. The calcu- 
lation of inter-, or, betweeen-Ss variance was 
based upon six separate variance calculations, 
one for each message length, representing 
the variability of scores among six Ss. The 
calculation of intra-, or, within-Ss variance 
was based upon six separate variance calcula- 
tions, one for each S, representing the vari- 
ability of scores among six message lengths. 
In order to provide comparable data for all 
points on this graph, scores for the four fastest 
rates were taken from Block 5, since the 
slowest rate was administered just prior to 
that block. Each point represents the aver- 
age of six variances. 


per sec. Second, with the exception 
of small differences at the fastest 
presentation rate, the average vari- 
ance associated with uncertain-length 
messages is greater than that asso- 
ciated with certain-length messages. 
Third, the inter-S variance is con- 
sistently greater than, and exhibits 
a steeper slope than, the intra-S 
variance. 


EXPERIMENT III: GroupING TESTS 
Method 


In the course of the experiment, Ss re- 
peatedly reported that they attempted to 
group items in recall of the materials. There- 
fore, a set of tests was introduced in which 
grouping was imposed upon the message 
presentation. For example, a message with 


a grouping of four was read in the following 
manner: 2632, (pause), 4167, (pause), 8320, 
etc. The average rate of presentation was 
adjusted to 1 digit per sec. for all tests. 
The grouping tests were carried out at the 
end of the Trial Blocks 2, 4, 6, 8. 


Results 


The results of the grouping tests 
are presented in Fig. 6 as a function 
of the number of digits per grouping. 
The ordinate is the digit span; the 
separate curves are associated with 
messages of certain and uncertain 
length. A grouping of one is identical 
with the conditions of the previous 
tests. 

Highest spans are associated with 
a grouping of four units for both 
messages of certain and uncertain 
length. For messages of uncertain 
length, the span for a grouping of 
four digits was significantly greater 
than for all other groupings (with 
the exception of a grouping of three 
digits). None of the mean differences 
was significantly different for mes- 
sages of certain length. Analysis of 
variance indicates that grouping is 
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Fic. 6. Auditory memory span as a func- 
tion of digit grouping. Each point is based 
upon 20 messages (five lengths, four replica- 
tions). : 
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a significant variable and that the 
interaction of the grouping X length 
certainty is also significant. Support- 
ing evidence for the advantage of 
grouping is presented in Martin and 
Fernberger (1929). 


EXPERIMENT IV: VisuaL TEsTs 
Method 


Between Trial Blocks 4 and 5, tests were 
carried out with visual messages. Digits 
were successively flashed by a teletype 
programming unit. The first line of the 
display box exposed the numerals 0 to 3; 
the next line exposed the numerals 4 to 7. 
The average visual angle subtended by the 
extreme numerals, e.g., 0 and 3, was approxi- 
mately 4°. The range of message lengths 
was 4-22 digits. The rates of message 
presentation employed were .25, .71, and 
2 digits per sec. The number of possible 
alternative digits per message unit was 2, 4, 
and 8 digits. For example, with two possible 
alternatives, only the digits 0 and 1 were 
flashed; with four alternatives, only the 
digits 0, 1, 2, 3 were flashed, etc. 


Results 


Figure 7 demonstrates the role of 
message length and _ corresponds 
roughly to the presentation of Fig. 2 
for auditory spans. The difference 
in the spans associated with messages 
of certain and uncertain length is 
less than that obtained with the 
auditory span. However, it may 
be noted that the range of message 
lengths in the visual tests was con- 
siderably smaller than with the 
auditory tests. 

The role of the number of possible 
alternatives per message units is 
presented in Fig. 8 (note the ex- 
panded scale of. the ordinate). The 
important point of Fig. 8 is that there 
is little change in the digit span above 
two alternatives per unit, although 
there is a slight superiority at two 
alternatives per unit. (Note: if Ss 
merely responded by chance over 
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Fic. 7. Visual memory span as a function 
of the length of presented message. Each 
point is based upon 126 messages (seven Ss, 
two blocks, three rates, three alternatives 
per digit). 





several units, spans at two alterna- 


tives per unit would have been about 
2/3 digit higher than with four alter- 
natives per unit.) 

An analysis of variance of the visual 
tests (for the three longest lengths) 
is presented in Table 2. The third- 
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Fic. 8. Visual memory span as a function 
of the number of alternatives per message 
unit. Each point is based on 126 messages 
with results averaged over other variables. 
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TABLE 2 


ANALYSIS OF VARIANCE OF VISUAL 
Dicit SPANS 


Source 
Certainty (C) X 5* 
Blocks (B) 21° 
Rates (R) 137** 
Alternatives (A) 17° 
Ss 21° 
Lengths (L) 2 
Residual (1) 
4.0** 
2.0* 
3.8** 
: 3.1° 
/ 2.4* 
idual (2) 
xC 


x 
» XK 
x 
x 


ZRFAR 


Ze. 


2.8°° 
1.8** 


Rx A 24 
sidual (3) 619 


e 


t 


Note.—-This analysis was performed upon messages 
of 15-22 digits. See note to Table 1. 

*P < 0S. 

™-P < O1. 


order residual variance was about 
6.4% of the total variance. The 
over-all average estimated standard 
error of a_ single observation (a 
single message to a single observer) 
was about 4.0 digits. 

Because the visual digit span scores 
of Fig. 7 are substantially higher than 
the spans of Fig. 2, control tests were 
carried out at the end of Trial Block 
8 to compare auditory and visual 
spans under identical conditions. In 
this control test, for both auditory 
and visual presentation, the number 
of alternatives per message was 10, 
the rate of presentation was 1 digit 
per sec.; and the range of message 
lengths. was 4-40 units. These tests 
demonstrated that the auditory and 
visual spans were essentially identical 
for equivalent test conditions. For 
certain-length messages, the average 
auditory and visual span was 11.6 
digits; for uncertain-length messages, 
the average auditory span was 9.4 
digits and the visual span was 8.6 
digits. The latter difference was 
statistically not significant (P > .4). 


DISCUSSION 


The present study confirms a host of 
previous studies which demonstrate that 
the level of performance in a recall task 
is related to Ss’ knowledge of selected 
characteristics of the to-be-performed 
task (Brown, 1954). In addition, it 
singles out message-length certainty 
as a significant variable in the recall of 
digits. 

More important, it appears that the 
running memory task may throw some 
light upon relevant factors in short-time 
information storage. Specifically, it is 
convenient to consider the differences 
in performance between certain-length 
and uncertain-length messages in terms 
of two factors. These two factors will 
be termed: proactive interference and 
behavioral strategy. 

Proactive interference is associated 
with the fact that the digit span is 
decreased for long messages (Fig. 2 and 
the significant LX C interaction of 
Table 1). In effect, we assume that the 
initial portion of a message may interfere 
with the retention of the latter portion 
of the message. The interference is 
seen in terms of the decrement in spans 
for long messages, especially for long 
messages of uncertain length. More- 
over, since Ss effectively ignore the 
initial portion of long known-length 
messages, it is reasonable that the decre- 
ment in spans is substantially lower for 
known length than for messages of 
unknown length. 

Behavioral strategy is associated with 
operations performed by S which have 
proved successful in aiding message 
retention. A very simple compound 
set of strategies is observed with long 
known-length messages especially at 
slow and moderate rates of known 
presentation. A decision is first reached 
on the length of the message which can 
be successfully challenged. The S 
simply counts the digits, ignoring the 
initial portion of the message, and then 
actively attempts to challenge a small 
section of the message. It may be 
noted that even with known-length 
messages, extremely large differences are 
observed within the same S on successive 
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trials. For example, Ss often reported, 
“I bit a bigger hunk than I could chew.” 

Another behavioral strategy is the 
grouping of the presented information. 
In addition to Ss’ reports and observa- 
tions made during overt rehearsal, the 
observed differential effect of grouping 
(Fig. 6) lends support to the employ- 
ment of a grouping strategy. That the 
effect of grouping is minimal for the 
known-length messages suggests that Ss 
superimposed their own grouping struc- 
ture upon the materials. 

Another observed behavioral strategy 
is overt repetition of the digits at slow 
rates of message presentation. The 
analogy to a repeated tape loop for the 
purpose of “burning in’ the required 
digits is tempting. Indeed, we observe 
substantial improvements at still slower 
rates for both certain- and uncertain- 
length messages. 

To account for the larger effect of rate 
of presentation upon messages of known 
length than for messages of unknown 
length we consider both interference 
effects and strategic operations. The 
interference effects are greater for un- 
known-length messages than for known- 
length messages, presumably, because 
in the latter case, the initial portion is 
effectively ignored. The effect of rate of 
presentation is to limit the range of 
strategic operations performed by S. 

A simple interaction between these 
two effects might be expected to change 
the slope of the function in Fig. 3 (the 
greater the interference, the lower is 
the slope of the digit span vs. rate of 
presentation). 

An alternative to the change of slope 
involves an assumption about the pro- 
active interference effects of materials 
at various degrees of learning. We 
assume, and the experimental literature 
on retroactive inhibition supports this 
assumption (Melton & Irwin, 1940), 
that greater interference effects are 
associated with materials which have 
been presented several times than with 
materials which have been presented 
number of times. Since the 
number of overt rehearsals is inversely 
related to the rate of presentation, the 


a lesser 
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greater interference effects at slower 
rates are not unreasonable in terms of 
our assumptions. At extremely slow 
rates of presentation, however, a well- 
motivated S can rehearse the entire 
length of the sequence that he is willing 
to challenge. In this circumstance, it 
would seem reasonable that there would 
be little difference in the spans for mes- 
sages of known or unknown length at 
extremely slow rates of presentation. 
The results of sample tests carried out 
at one digit every 4 sec. suggest that dif- 
ferences between the spans is decreased 
at slower presentation rates. It also 
may be noted that, in the retroactive 
inhibition literature (Melton & Irwin, 
1940), extremely well-practiced materials 
may be associated with less interference 
than materials at other stages of practice. 

We realize that, with two factors, one 
can describe an 
of phenomena. temper 
whatever success we have had in describ- 
ing the results. 


extremely wide range 
Therefore, we 


Nevertheless, the pres- 


ent examination is not inconsistent with 
the facts of short-time 
storage. 


informational 


SUMMARY 


The present series of experiments compared 
the recall of messages composed of randomly 
selected digits under two conditions of 
presentation: (a) uncertain length, in which 
Ss were uncertain of the length of the to-be- 
presented message; and (b) certain length, 
in which Ss were informed of the length of the 
to-be-presented message. A_ pilot study 
revealed digit spans for uncertain-length 
messages which were considerably smaller 
than would be expected with certain-length 
messages. Further studies examined the 
effects on the recall of uncertain- and certain- 
length messages of a wide range of experi- 
mental variables. 

The major findings were: 


1. Except for limiting ranges of conditions, 
as in 100% correct response, recall perform- 
ance with messages of uncertain length is 
significantly poorer than performance with 
certain-length messages under all conditions 
of testing. 

2. The difference in recall performance 
between certain- and uncertain-length mes- 
sages is maintained despite considerable 
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amounts of practice. Spans for both types 
of messages significantly improve with prac- 
tice. The improvement with practice is 
not specific to the messages employed. 

3. An increase in rate of digit presentation 
is associated with: (a) a decrease in digit 
span for both types of messages; (b) a rela- 
tively more rapid decrement in the digit span 
for certain-length messages; and (c) a decrease 
in variability in digit-span scores. 

4. Grouping of the presented digits dif- 
ferentially affects performance with known- 
and unknown-length messages. There is 
little effect of grouping on certain-length 
messages, but significant grouping effects 
are observed with uncertain-length messages. 

5. Control tests under moderate rates of 
presentation yield equivalent scores for 
auditory and visual presentation for equiva- 
lent éxperimental conditions. 

6. The information per digit only slightly 
influenced the digit-span scores for certain- 
and uncertain-length messages. 


I. POLLACK, L. B. JOHNSON, AND P. R. KNAFF 


The results are interpreted in terms of 
two factors: proactive interference and the 
behavioral strategy employed by Ss. 
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THE ASSOCIATION VALUE OF RANDOM SHAPES! 
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It is well known that verbal materi- 
als vary in meaningfulness or associa- 
tion value, and that these variations 
are related to learning and retention. 
Patterns and shapes may vary simi- 
larly ; however, little systematic con- 
trol over such variation has been 
exercised in studies of perceptual 
learning and retention. Indeed, little 
effort to standardize such materials 
has been expended (cf., e.g., the 
discussions by Hilgard [1951, p. 547 ] 
and Graham [1951, pp. 911-915). 

A number of experiments have 
appeared in which random shapes 
have been used as stimuli in tasks 
involving perceptual learning and 
retention. Random shapes have also 
been employed in studies of mediated 
transfer and “‘predifferentiation.”? In 
most of these studies, control has 
not been exercised over possible effects 
of association value of the shapes 
upon performance of Ss. The present 
experiment was undertaken to provide 
a pool of random shapes with known 
association value for use in studies 
of the effects of certain stimulus 
variables and pretraining upon recog- 
nitive performance. It was consid- 
ered desirable to provide for variation 
of association value and stimulus 


' This research was supported in part by 
the United States Air Force under Contract 
No. AF 41 (657)-47, monitored by the Opera- 
tor Laboratory, Air Force Personnel and 
Training Research Center, Lackland Air 
Force Base, Texas. Permission is granted 
for reproduction, translation, use, and dis- 
posal in whole or in part by or for the United 
States Government. 

2 A discussion of experiments on prediffer- 
entiation and the effects of meaningfulness 
may be found in the article by Arnoult 
(1957). 
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complexity, in order to assess the 
interaction of these two variables 


as determinants of recognition. 


METHOD 


Materials.—The stimuli were 180 random 
shapes, 30 for each of 6 levels of complexity, 
as defined by Attneave (1957).4 Each shape 
was constructed by first plotting points, 
selected by use of a table of random numbers, 
on a 100 X 100 grid, and then connecting 
them according to the following rules: (a) 
the most peripheral points were first con- 
nected to form a convex polygon. (b) The 
interior points were then chosen at random 
and connected one at a time to the sides, 
which also were labeled and chosen randomly. 
(c) After each connection as above, the line 
which defined the side to which the last point 
was connected was removed and the process 
repeated for the next point. Either 4, 6, 8, 
12, 16, or 24 points were used for a given 
shape. 

This method corresponds to Method 1 of 
a series of methods suggested by Attneave 
and Arnoult (1956) for the construction of 
random shapes. Each shape was photo- 
graphed (black on white) and mounted on 
4 X 5-in. hardboard. 

Arrangement of materials.—An arbitrary 
identification number and an arbitrary rank 
were assigned to each shape and were punched 
on an IBM card and on the back of each 
photograph. The IBM cards were then 
shuffled in haphazard fashion to obtain a 
sequence of the 180 shapes for presentation 
to S. Fifty such sequences were similarly 
obtained and listed by means of an IBM 
Model 405 tabulator. A coefficient of con- 
cordance was computed by comparing the 
mean rank of each shape in the sequences 
with the mean rank expected from a random 
sequence. This procedure yielded a value 
significant at the 22% level. The haphaz- 
ardly constructed sequences were thus also 

* Complexity refers to the number of points 
which determine inflections on the perimeter 
of the shape. Attneave found a linear rela- 
tion between the logarithm of the number of 
points and judged complexity. 
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considered to be random for the group of Ss 
used in the experiment. 

Apparatus.—The presentation apparatus 
was a shadow-box arrangement with a 4 X 5- 
in. window at the back. The photographs 
were placed on an inclined ledge behind the 
open window during exposure and were 
changed manually. ‘The interior of the box 
(into which S looked) and all areas in front 
of the window were painted flat black. A 
60-w. incandescent lamp was located above 
and behind the window to yield an illumina- 
tion of about 35-ft.-c. on the surface of the 
cards, without glare. Exposure time for 
each card was 3 sec. Viewing distance was 
about 30in. The S's responses were recorded 
by means of a wire recorder and concealed 
microphone. 

Subjects—The Ss were 50 volunteer 
university students, 33 men and 17 women. 
Four Ss of the original group were discarded 
and replaced, three because they were unable 
to follow directions and one because he failed 
to complete the task. 

Procedure.—After being seated in the 
darkened experimental room, S was told: 

“Il am going to show you a number of 
shapes. I will show the shapes to you during 
the period between buzzes, like this [two 
samples shown]. You will have the time 
between buzzes to look at each shape. Some 
of the shapes may remind you of some familiar 
object or situation while others may not 
remind you of anything. Your job will be 
to name whatever the shape reminds you of, 
if anything. Some of the shapes may remind 
you of some object or situation, but you may 
not be able to describe it in the short time 
during which you see the shapes. If the 
shape reminds you of something that you 
can describe in a word or two simply say that 
word or phrase. If the shape reminds you 
of something, but you cannot describe it in a 
word or two, say simply, ‘Yes.’ If, of course, 
the shape doesn’t remind you of anything, 
say, ‘No.’ It is important that you say 
something, either a word, if the shape reminds 
you of something that you can describe, or 
‘Yes,’ or ‘No’ for each shape that you see. 
Questions?” 


RESULTS AND DIscussION 


The Ss’ responses were punched 


into IBM cards, which were then 
sorted and tabulated to obtain the 
following data: (a) the number of 
Ss making associative responses to 
each shape, (b) the verbal content 
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Fic. 1. Distribution of association values 


of 180 shapes. 


of each response, and (c) number of : 
shapes having each frequency as in 
a above. 

Association value of each shape 
was the percentage of the Ss making 
“Yes’”’ or content responses to the 
shape. The range of percentages 
thus obtained was between 20% and 
62% with a mean at 38%. A fre- 
quency distribution of the 180 shapes 
was plotted and is shown in Fig. 1. 
The distribution was approximated 
by fitting a Gaussian curve to the 
data, and it may be seen that the 
fit is a reasonably good one, with the 
exception of the slight skewness of 
the data, as shown by the excess of 
scores at the low end. 

Consideration of the above results 
indicates that none of the 180 shapes 
was completely devoid of associations. 
The lowest value was 20%. Further, 
no shape evoked associative responses 
from the entire group of Ss. This 
latter result is interesting in view 
of the strong resemblance of some of 
the simpler shapes (4 and 6 points) 
to geometric forms such as triangles, 
rhomboid figures, and _ trapezoids. 
Perhaps our Ss were unfamiliar with 
the names of some of these figures 
and therefore could not report appro- 
priately. On the other hand, those 
forms did not evoke “Yes’’ responses 
either. A third finding of interest was 
the compression of the distribution 
around the mean with spread at the 
extremes. This finding might be 
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ASSOCIATION VALUE OF RANDOM SHAPES 


interpreted as indicating that the 180 
shapes have associative characteristics 
in a “low-medium”’ range and that 
they are somewhat more homogeneous 
than, for example, nonsense syllables, 
with respect to associations. 

The shapes scaled in the study are 
shown in Fig. 2-7. Data concerning 
association value for each shape as 
well as the proportion of associative 
responses which were verbal content 
responses other than “Yes” and an 
index of the diversity, or hetero- 
geneity, of the content responses, 
are shown in Table 1. The associa- 
tion value (A) is the percentage of the 
50 Ss responding to the shape with the 
word “Yes” or a verbal content word. 
The content value ¢C) is the propor- 
tion of the total percentage of responses 
which were words or phrases denoting 
associations with objects or situations. 
The heterogeneity index (H) is the 
mean amount of information per 
content response, computed from 
the entropy formula proposed by 
Shannon and Weaver (1949) as a 
measure of information.‘ 

A preliminary inspection of the 
data indicated that shapes of high 
complexity tended to evoke fewer 
associative responses than did those 
of low complexity. To check this 
observation, a contingency analysis 
was conducted. The distribution was 
cut at approximately equal tercile 
levels to define shapes of high, 
medium, and low association values. 
Each of these categories was then 
split into six subcategories in terms 
of complexity, with the result as 
shown in Table 2. The obtained 
contingency chi square was 32.31 
(P < .001). 

It may be seen from Table 2 that 
there is a larger number of shapes 


‘H = — z p; log pi, where p; is the prob- 
ability (proportion) of content response of 
the ith class. 
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TABLE 2 


NuMBER OF SHAPES OF HIGH, MEDIUM, AND 
Low ASSOCIATION VALUE ARRANGED 
IN ORDER OF COMPLEXITY FOR 
CONTINGENCY ANALYSIS 


Number of Points in Shapes 
Assoc. 
Value 


High 


Medium 


of high complexity in the low associa- 
tion value category than in the high 
category, while the reverse is true 
for the simple shapes. This finding 
would tend to indicate an inverse 
relation between complexity and asso- 
ciation value for the 180 shapes. 

Further preliminary inspection of 
the data of Table 1 and of the cata- 
logued responses indicated that shapes 
of low complexity evoked not only 
more content responses and fewer 
yeses, but also responses which were 
reflective of their resemblance to 
objects. Shapes of greater complexity 
seemed to evoke responses of greater 
variety of content, in the sense that 
they did not reflect clear resemblances 
to objects. It may be that this lack 
of resemblance resulted in responses 
of greater ‘‘projective’’ quality (e.g., 
the 24-point shape No. 30 evoked 
the responses church, nun, branch, 
and city, while the 4-point shape No. 
17 evoked the responses triangle, 
kite, sail, and pyramid). 

To investigate these relations some- 
what more precisely, the correlations 
between the pairs of variables were 
calculated, and these are shown in 
Table 3. It may be seen that the 
relation between Complexity (N) 
and Content (C) is an inverse one 
(r = —.34), while the relation be- 
tween Complexity and Association 
Value (A) and between Complexity 
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TABLE 3 


INTERCORRELATIONS OF ASSOCIATION VALUE 
(A), CompLexity (N), HETEROGENEITY 
(H) AND ConTENT (C), FOR THE 
180 SHAPES 





and Heterogeneity (H) are also nega- 
tive (rna= —.19, rxy = —.22). This 
would seem to indicate that as the 
shapes decrease in complexity they 
tend to evoke more associations, and 
ones which are more likely to be of 
greater content and heterogeneity as 
well. The relations between the other 
variables are positive and seem to indi- 
cate a general tendency for shapes of 
high association value to evoke re- 
sponses of greater heterogeneity as 
well as content. This relation might 
be expected to occur, since there is 
greater likelihood of responses to be 
different as more persons respond to 
a shape, provided only that the shape 
does not clearly resemble a common 
object, as, say, a photograph of the 
object would. 


Whatever interpretation one may make 
of the results, it is clear that random 
shapes vary with respect to the number 
and kind of associations they elicit. 
These variations may be related to the 
ease of learning and retention of the 
shapes. Association value has _ been 
treated most often in a qualitative way 
in studies of perceptual learning, and 
a more thorough account might be 
made of the variety of existing results 


‘ complexity (number of points). 


JAMES M. VANDERPLAS AND EVERETT A. GARVIN 


in this area. By use of materials of 
known association value, control of this 
variable might be exercised, with cor- 
responding clarity of the basis for 
results. 


SUMMARY 


Association value, content, and hetero- 
geneity of associative responses were deter- 
mined for 180 random shapes of varying 
Tabulations 
of these variables and correlations between 
them are presented. The results indicate 
a range of association value from 20% to 
62% for the,shapes examined. An inverse 
relation was noted between the complexity 
of the shapes and the number, content, 
and heterogeneity of associative responses, 
while a positive relation exists among the 
other variables, for the shapes studied. The 
shapes presented form a pool of materials 
which may be used in studies of perceptual 
learning and retention in which control of 
association value is desirable. 


REFERENCES 


Arnoutt, M. D. Stimulus predifferentia- 
tion: Some generalizations and hypotheses. 
Psychol. Bull., 1957, 54, 339-350. 

ATTNEAVE, F. Physical determinants of the 
judged complexity of shapes. J. exp. 
Psychol., 1957, 53, 221-227. 

ATTNEAVE, F., & ARNOULT, M. D. Method- 
ological considerations in the quantita- 
tive study of shape and pattern perception. 
Psychol. Bull.,1956, 53, 452-471. 

GrauaM, C.H. Visual perception. In S. S. 
Stevens, Handbook of experimental psy- 
chology. New York: Wiley, 1951. Pp. 
868-920. 

Hitcarp, E. R. Methods and procedures 
in the study of learning. In S. S. Stevens, 
Handbook of experimental psychology. New 
York: Wiley, 1951. Pp. 517-567. 

SHannon, C. E., & WEAVER, W. The 
mathematical theory of communication. Ur- 
bana: Univer. of Illinois Press, 1949. 


(Received March 17, 1958) 





Journal of Experimental Psychology 
Vol. 57, No. 3, 1959 


COMPLEXITY, ASSOCIATION VALUE, AND PRACTICE AS 
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PAIRED-ASSOCIATES TRAINING! 
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Studies of transfer of predifferen- 
tiation training have indicated that 
learning verbal labels for stimuli in a 
preliminary task facilitates perform- 
ance of a new task involving the same 
stimuli but new, different responses. 
Positive transfer in these kinds of 
tasks has been accounted for on the 
basis of postulates that the stimuli 
become (a) more “distinctive’’ as a 
result of attachment of the labels 
(e.g., Goss, 1955) or (b) “‘differen- 
tiated”’ as a result of reduction of 
interstimulus generalization gradients 


(Gibson & Gibson, 1955). Although 


studies which involve transfer of such 
verbal pretraining to tasks employing 


discriminative verbal or motor re- 
sponses have supported these postu- 
lates, more direct tests, of improve- 
ment in discrimination or recognition 
following such practice in labeling, 
have yielded negative or ambiguous 
results.” 

Divergences in some of these latter 
studies have been accounted for by 
reference to possible variations in the 
meaningfulness of the labels employed 


' This research was supported in part by 
the United States Air Force under Contract 
No. 41 (657)-47, monitored by the Operator, 
Laboratory, Air Force Personnel and Training 
Research Center, Lackland Air Force Base, 
Texas. Permission is granted for reproduc- 
tion, translation, publication, use, and dis- 
posal in whole and in part by or for the United 
States Government. 

?For a discussion of relevant studies and 
hypotheses associated with them, see the 
paper by Arnoult (1957) ; a discussion of some 
of the relations between experiments on 
transfer and recognition is contained in the 
paper by Vanderplas (1958). 


and in the amount of practice on the 
labeling task (Arnoult, 1956). How- 
ever, a recent summary (Arnoult, 
1957) of relevant studies indicates 
that ambiguities still exist when 
these variables are taken into account. 
Positive results have been obtained, 
for example when the labels were not 
meaningful (Arnoult, 1956) and nega- 
tive results have been obtained when 
meaningfulness of the labels was 
varied over a _ considerable range 
(Dysinger, 1951). Moreover, extended 
practice has in some instances failed 
to yield significantly greater improve- 
ment than that obtainable with small 
amounts (Arnoult, 1957, pp. 341 
342). 

It is possible that the divergences 
cited above are in part due to interac- 
tions of the several variables. Amount 
of practice, for example, may interact 
with the initial difficulty of the 
stimulus material, the meaningfulness 
of the labels, or both. Also, in cases 
where positive results have been 
obtained the effect may have been 
due to the meaningfulness not of the 
response labels as such, but of the 
stimuli. In few studies of this variety 
has there been rigorous control over 
these variables, nor has there been 
systematic variation of them in the 
experiments (cf. also the discussion 
by Arnoult [1957, pp. 346-348] 
for some of the difficulties involved). 

In view of these divergences and 
alternatives, an experiment to deter- 
mine the independent and interactive 
effects of these variables seemed in 
order. A factorial experiment incor- 
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porating these variables would permit 
the assessment of the independent 
contribution of each as well as their 
interactive effects. Further, if, as 
suggested here, stimulus meaning- 
fulness is important, it should be 
effective even when response meaning- 
fulness is absent. 

The experiment therefore took the 
form of a factorial design, with three 
independent variables: stimulus com- 
plexity, meaningfulness, and amount 
of labeling practice. Complexity was 
used to vary the initial discrimina- 
bility of the stimuli and was defined 
as the number of points which deter- 
mined inflections on the perimeter of 
straight-sided random shapes. This 
definition was selected on the basis 
of work by Attneave (1957), who 
found judgments of complexity to be 
related positively to this variable. 
Other studies have found complexity, 
defined in terms of amount of informa- 
tion (Attneave, 1955; Fitts, Wein- 
stein, Rappaport, Anderson, & Leon- 


ard, 1954) in a form or pattern, in 
terms of length of verbal stimuli 
(McGinnies, Comer, & Lacey, 1952), 


and in terms of number for dot 
patterns (French, 1954) to be related 
to discriminative and recognitive per- 
formance. Complexity thus would 
appear to be related inversely to the 
relative size of detail of the stimulus 
and similarly to discriminability. Its 
independent relation to recognition 
following labeling practice and _ its 
interaction with amount of practice 
and meaningfulness is, of course, an 
empirical question for the present 
study. 

Meaningfulness was defined by the 
association value of the stimuli, in 
terms of the percentage of Ss in an 
independent group making associa- 
tive responses (Vanderplas & Garvin, 
1959). This definition has a common 
basis in studies of verbal learning 


JAMES M. VANDERPLAS AND EVERETT A. GARVIN 


and is similar to that used by Glaze 
(1928) for nonsense syllables. Prac- 
tice was defined as the number of 
trials in a labeling task. 


METHOD 


Experimental design.—Four Ss were as- 
signed at random to each of 36 conditions of 
the experiment. Three levels of complexity 
(6-, 12-, or 24-point random shapes), three 
levels of meaningfulness (28%, 38% or 48% 
median associative frequency) and four levels 
of practice (2, 4, 8, or 16 trials) were em- 
ployed. A total of 144 volunteer university 
students served as Ss. 

The experiment was conducted in two 
parts: In the first part (paired-associates task) 
S attempted to learn a nonsense syllable label 
for each of a set of eight random shapes. 
For a given S all shapes were of the same 
complexity and a narrow range of association 
values. In the second part S was given a 
recognition test in which he attempted to 
select the shapes for which he learned the 
labels from a group of shapes which contained 
variations of these ‘‘prototypes.” # 

A pparatus.—The materials for the paired- 
associates task were presented in a shadow- 
box, open at the front and containing a 
4 X 5-in. window at the back. Cards con- 
taining the shapes and syllables were placed 
on a ledge behind the window during exposure 
and were changed manually. Illumination 
was provided by a fluorescent lamp above 
the window and behind the shadow-box. 
The illumination was about 35 ft.-c. on the 
surface of the cards, without glare. Viewing 
distance was about 30 in. 

The recognition test cards were presented 
on a 6 X 22-in. frame, mounted on a table 
and inclined at an angle of 10° from vertical. 
Cards were exposed by lifting each in turn 
from behind the frame and placing it quickly 
on the frame. When S responded the card 
was quickly removed. Exposure time was 
thus approximately equal to S's response 
time. Latency and type of response were 
recorded by a second E, using a stop watch 
with .1 sec. accuracy. Illumination was 
provided by a 60-w. incandescent lamp, and 
was at a level of about 35 ft.c. on the surface 
of the cards, without glare. Viewing distance 
was about 30 in. 

Paired-associates task.—The stimuli were 
random shapes, constructed according to the 


+The prototype is the shape for which S 
attempted to learn the label. 
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method of Attneave and Arnoult (1956, 
p. 454). Labels were taken from the 0% 
list of Glaze (1928). The shapes were photo- 
graphed and mounted on hardboard, and 
the labels were typed on paper and glued to 
the photographs above the shapes. Sample 
lists are shown in Fig. 1. The shapes and 
labels were presented in the window of the 
shadow-box at a 2-sec. rate, with learning 
by the anticipation method. Four orders of 
presentation were used to avoid serial effects. 
For each shape there was presented first 
the shape alone and then the shape and label 
together. Labels were spelled by S. Cor- 
rect anticipations and errors were recorded. 

Recognition test—Recognition test items 
for a given S were 16 cards (5 X 28 in.), 
upon each of which was mounted a photo- 
graph of five shapes, arranged in a row. On 
eight of the items one of the shapes was a 
prototype and the other four were variations. 
On the remaining eight items all five shapes 
were variations. Items were constructed by 
first duplicating the prototype; then each 
point was moved up, down, right, or left at 
random through a constant distance, and 
the points connected as in the construction 
of the prototype. Sample items of the recog- 
nition test are shown in Fig. 2. 

Items of the recognition test were pre- 
sented in random order. The S was in- 
structed to point to one of the five shapes 
and say ‘“‘That one”’ if he believed it to be one 
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Yor 


cia 


a¥z yuv 


QYNn vrec 


- A 


Sample lists used in the paired- 
associates task. 


Fic. 1. 


e+e 
Bue 


Fic. 2. 


Sample items from the 
recognition test. 


on which he had practiced or to say ‘‘None"’ 
if he believed none of the shapes to be one 
on which he had practiced. Speed and 
accuracy were stressed. Number and time 
of response, for the following five types, were 
recorded: AA, correct selection of a proto- 
type shape; AB, incorrect selection of a 
variation; AN, incorrect rejection of all 
shapes when one of them was a prototype; 
NA, incorrect selection of a shape when all 
were variations; and NN, correct rejection 
of all shapes when all were variations. 


RESULTS 


Learning task.—The means and 


SD's of the number of correct antici- 
pations of the labels are shown in 


Table 1. It may be seen that the 
learning scores were fairly low, though 
there seems to be a progressive im- 
provement as a function of practice. 
The mean number of correct antici- 
pations ranges from .028 after two 
trials to 3.4 after 16 trials. No test 
of the significance of this trend was 
made, however, since the data of 
primary interest were the scores 
on the recognition test. 

Recognition test.—The mean num- 
ber and time of responses for the 
data representing correct recognitions 
(AA), correct rejections (NN), and 
incorrect rejections in the case of 
items containing the prototype (AN) 
are shown in Table 2. Separate 
analyses of variance were carried out 
for each response type,‘ and the 

*The mean of the number of incorrect re- 
sponses AB and NA may be inferred from the 
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TABLE 1 


MEANS AND SD's or Correct ANTICIPATIONS DURING PAIRED-ASSOCIATES 
TASK, FOR EAcH CONDITION OF THE EXPERIMENT 
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TABLE 2 


MEAN NuMBER (N) or Correct REcoGNitIons (AA RESPONSES), CORRECT REJECTIONS 
(NN Responses), AND INCORRECT REJEcTIONS (AN RESPONSES) AND MEAN 
TIME (t) OF RESPONSE FOR EACH RESPONSE TYPE AND EACH 
CONDITION OF THE EXPERIMENT 


Trials 


Assoc. 


Shapes Value 
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High 
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TABLE 2—Continued 
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results of these analyses are shown 
in Tables 3 and 4. 

From Table 2 it may be seen that 
complexity of the shapes (number 


data for the AA, NN, and AN responses, since 
all must sum to 16 for each condition of the 
experiment. Thus, AA + (AB'+ AN) = 8 
and NN + NA = 8. The reason for choosing 
the AN responses rather than the AB re- 
sponses for analysis is indicated in the 
discussion. 
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of points) is inversely related to 
correct recognition and positively 
related to time of response. The 
mean number of correct recognitions 
(AA responses) drops from 2.8 for the 
6-point shapes to 2.1 for the 24-point 
shapes, while mean time of response 
increases from 2.9 sec. to 4.3 sec. over 
the same range of complexity (a 
similar trend for time of response 
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TABLE 3 


ANALYSIS OF VARIANCE OF SCORES ON RECOGNITION TEST, 
FoR EACH RESPONSE TYPE 


(Number data) 


Source 





Response Type 





Complexity (C) 
Association value (AV) 
Practice (P) 

C x AV 

Fy 

AV xX P 

CxAVxP 


Error 











*P < OS. 

oP < Ol. 

may be seen for the NN and AN 
responses). The trend for the num- 
ber data resulted in a_ significant 
variance in the analysis (see Table 3: 
F = 3.22, P < .05) of the AA re- 
sponses, but the time trend was not 
found to be significant (Table 4). 
There appears also to be a tendency 
for correct rejections to decrease 
as the complexity of the shapes 
increases; however, the effect was 


Nu 
=8 
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not found significant in the analysis 
of variance. 

Association value appears to be 
positively related to correct recogni- 
tion but inversely related to correct 
rejections. The mean number of AA 
responses for the shapes of high, 
medium, and low association value 
were 2.7, 2.5, and 2.4, respectively, 
while the mean number of NN 
responses were 2.6, 2.7, and 3.1. 


TABLE 4 


ANALYSIS OF VARIANCE OF SCORES ON RECOGNITION TEST, 
FOR EACH RESPONSE TyPE 
(Time data) 


Source 


Response Type 





Complexity (C) 
Association value (AV) 
Practice (P) 

C xX AV 

Cx? 

AV x P 

cx AVxXP 


Error 











P< 0S. 
oP < 01. 
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However, neither of these effects was 
significant (see Table 3). 

Practice, within the limits of the 
experiment, was not significantly re- 
lated to variance of any of the re- 
sponse types. However, there seems 
to be a slight tendency for correct 
recognitions to increase as a function 
of practice, while correct rejections 
appear to follow a similar trend. 

It may be seen in Table 2 that 
effects of complexity interact with 
practice and (indirectly) with asso- 
ciation value of the shapes. The 
interaction variance for complexity 
and practice (C X P, Table 3) is 
significant at the .01 level, and while 
the interaction of complexity and 
association value is not significant 
(P < .10), the three-way interaction 
is significant at the .01 level for AA 
responses and at the .05 level for NN 
responses. This may be interpreted 
to mean that the C X P interaction 
depends upon the level of association 
value examined. In Table 2, correct 
selections of shapes of both high and 
medium association value seem to 
decrease fairly regularly; however, 
there is a reversal of this trend for 
the 12-point, low association value 
shapes (see Table 2). Correct re- 
jections, on the other hand, seem 
to follow a fairly uniform trend over 
the entire range of complexity, re- 
gardless of the level of association 
value. 


DISCUSSION 


The results lend partial support to 
the hypothesis that the initial difficulty 
of the material determines in part the 
effects of practice on recognition. Al- 
though there did appear to be a small 
effect of practice on improvement of the 
recognition score (Table 2), its effect, 
if any, was manifested only indirectly 
by way of its interaction with com- 
plexity. Complexity was the only main 
variable of significance, and it was 
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associated with a decrease in both cor- 
rect recognitions and correct rejections. 
Failure of discrimination, resulting from 
increased complexity of the shapes, 
would be expected to result in more 
confusions of the shapes and _ their 
variations, thus affecting both the learn- 
ing task and the recognition score. 
Although association value of the 
shapes was not significantly related to 
correct recognition, it may be seen from 
Table 3 that it was related significantly 
to incorrect rejections (AN). A graph 
of the data for these responses is shown 
in Fig. 3, along with a similar graph 
for the NA and AB responses (these 
data are calculated from the data of 
Table 2 by use of the formula in Foot- 
note 4). The number of AN responses 
may be seen to decrease markedly as 
association value increases. A _ similar 
effect was noted for the NN responses 
(see Table 2). In Fig. 3 there is also 
seen an increase in both NA and AB 
responses, which represent incorrect se- 
lections. Within the linear restrictions 


YM NA 
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MEAN NO. OF RESPONSES 





j 
30 50 


ASSOCIATION VALUE (PER CENT) 


Fic. 3. Correct selections and rejections 
as a function of association value. 
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TABLE 5 


DISTRIBUTION OF RESPONSES ON THE RECOG- 
NITION TEST FOR SHAPES FOR WHICH THE 
LABELS WERE OR WERE NoT Cor- 
RECTLY ANTICIPATED IN THE 
PatRED-ASSOCIATES TASK 





Correct Incorrect 
Labeling — — | —__——— 
mangoes Rejec- | Selec- 
tion tion 


Anticipated 54 | 218 


(66) | (208) 
342 


Rejec- 
tion 


39 

(52) 

Not Antici- 
pated 


1025 | 269 


(256) 











(330) | (1035) 


* Frequency expected on the basis of proportional 
distribution. 


of the experiment, it is possible that 
increases in selection responses, while 
not significant in themselves, might 
serve to depress the number of rejec- 
tions by way of their cumulative effect. 
Although this may seem to be an artifact 
of the method, it accounts in part for 
the significance of the variance in AN 
responses, and it may be subjected to 
test, at least in part. A _ possibility 
which presents itself is that the learning 
of the label for a shape would foster 
more selection responses by way of the 
partial arousal of the labeling response 
during the recognition task (Miller & 
Dollard, 1941). To the extent that 
variations of the shape also served to 
evoke the labeling response, we would 
expect more selection responses to be 
made, both correct and incorrect. 

Accordingly, we counted the number 
of correct and incorrect selections and 
rejections made by our Ss and arranged 
them according to whether the labels 
were or were not correctly anticipated 
during the learning task. The data are 
shown in Table 5. It may be seen that 
shapes with correctly anticipated labels 
were more often correctly selected, but 
they were also more often incorrectly 
selected as well. If anything, learning 
of the labels resulted in more selections, 
regardless of the correctness of the 
“recognition.” 
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It should be noted that the failure to 
find clear effects of practice is at variance 
with the results of some studies in which 
practice effects were found (Arnoult, 
1956). However, effects of physical 
properties of the shapes (complexity) 
and their association value may out- 
weigh the practice effects in the present 
experiment. If the interactive effects 
of the present experiment are genuine, 
it means that the several variables may 
operate together in a relatively complex 
way. Studies incorporating additional 
levels of these and other variables would 
be necessary to determine precisely the 
nature of these relations. The present 
findings are primarily demonstrational, 
and dimensional analysis of these effects 
is needed. 


SUMMARY 


Recognition of random shapes varying in 
complexity and association value was studied 
following four levels of practice in labeling. 
Complexity was inversely related to recogni- 
tion of prototype shapes and to correct rejec- 
tion of variations of the prototypes. Asso- 
ciation value was positively related to selec- 
tion responses, both correct and incorrect, 
but not significantly related to correct 
recognitions. Labeling practice was not 
significantly related to any of the response 
types scored. 
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THE DISTANCE GRADIENT IN KINESTHETIC 
FIGURAL AFTEREFFECT 


JOHN P. CHARLES anp CARL P. DUNCAN 


Northwestern University 


In a review of FAE (figural after- 
effect) studies in Japan, Sagara and 
Oyama (1957) summarize the results 
of several systematic investigations 
in vision of the phenomenon known 
as the distance paradox or gradient. 
In general, the data support Kohler 
and Wallach’s (1944) finding that 
with increasing distance between con- 
tours of test and inspection figures, 
amount of FAE first increases and 
then decreases. The only important 
discrepancy in the inverted-U gra- 
dient was the consistent finding that 
small amounts of FAE occurred when 
test and inspection contours were 
identical; according to Kohler and 
Wallach, this case should yield zero 
FAE. 


Initially, the present study had been 
planned as an attempt. to dupticate, 
in kinesthesis, the distance gradient 


found in vision. An examination of 
the literature revealed, however, 
that as yet there has been no sys- 
tematic investigation of kinesthetic 
FAE which is free of a number of 
possible confoundings. The general 
procedure in studies of kinesthetic 
FAE (Spitz, 1958) is to have S 
estimate the width of a wood block 
(test figure) after having run his 
thumb and fingers repeatedly along 
opposite sides of an inspection block of 
different width. As Kohler and Din- 
nerstein (1947) pointed out, only one 
hand, not both, should be satiated 
(presented the inspection block) ; then 
the satiated hand should grip the 
test block while the judgment of its 
width is made by means of a scale 
manipulated by the other hand (de- 


tails of this procedure are described 
in the Method section). With this 
method, these authors reported find- 
ing kinesthetic FAE paralleling that 
in vision. (Kinesthetic FAE’s are 
measured in terms of degree of error 
in width judgments.) A _ distance 
gradient was mentioned, but no 
data were presented. (In a foot- 
note, FAE was reported when 
inspection and test block were iden- 
tical, which, again, is at variance with 
Kohler’s belief, later reiterated [ Koh- 
ler, 1951], that this should not occur.) 

Wertheimer (1954) pointed out that 
Kohler and Dinnerstein had not 
allowed for constant error in size 
judgments of this type. When, in 
his own data, constant error was 
eliminated, the amount of FAE was 
much less than that reported by 
Kohler and Dinnerstein. However, 
Wertheimer had his Ss satiate with 
both hands, which, as noted, Kohler 
and Dinnerstein claimed should not 
be used for quantitative work. In 
a more recent study Wertheimer and 
Leventhal (1958) again used both- 
hand satiation.!' 

! Ina personal communication, Wertheimer 
indicated that his procedure in the Wertheimer 
and Leventhal study (1958) was the same as 
that in an earlier study (Wertheimer, 1954). 
In each case S stood between two inspection 
bars. The left hand rubbed the 4-in.-wide 
inspection bar, while simultaneously, and 
with the same rate of movement, the right 
hand rubbed the 24-in.-wide inspection bar. 
Both hands made long, sweeping movements. 
At the end of the inspection period S stepped 
forward and gripped a 1}-in.-wide test bar 
with the left hand, while the right hand 
located a point of apparent equal width on a 
variable-width test bar. Wertheimer wrote 
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While studies have appeared in 
which kinesthetic FAE was employed 
in other connections (Spitz, 1958), 
no study has been directly concerned 
with the systematic investigation of 
kinesthetic FAE. Therefore, the first 
purpose of the present study was to 
obtain quantitative measurements of 
kinesthetic FAE, unconfounded by 
certain error variables. The second 
purpose was to determine if a dis- 
tance gradient in FAE, similar to that 
in vision, occurs in kinesthesis. 


METHOD 


Apparatus.—The materials were nine 
wooden blocks, 1 ft. long and 2 in. high. The 
biocks varied in width, the widths used being 
h, 4, 4, 3, 1, 14, 14, 13, and 2 in. All of these 
were used as test blocks; the 2-in. block also 
served as the inspection stimulus. 

The measuring device was a wedge of wood, 
40 in. long, 2 in. high, which increased in 
width from 0 to 4 in. at the rate of 1-in. 
increase in width per 10 in. of length. Verti- 
cal sides of blocks and wedge were covered 
with smooth plastic-coated paper to minimize 
textural cues. A tape measure, graduated 
in }-in. units, was glued to the top of the 
wedge. A long iron rod, mounted } in. 
above and parallel to the long axis of the 
wedge, supported a plastic H-shaped slider 
which assured that, when S gripped the 
wedge, his thumb and big finger were in a 
plane at right angles to the center line of the 
wedge. The slider also permitted accurate 
reading of the scale. 

The blocks were held rigid at both ends 
in a block holder. The block holder and 
scale (wedge) were placed on sturdy tables, 
28 in. high, placed parallel to each other and 
20 in. apart. A meter stick clamped to the 
wedge table permitted accurate location of 
the wedge whenever it was moved. 

Subjects.—A total of 220 naive Ss from 
introductory psychology courses, rewarded 


that he satiated both hands in each study 
in order to get as clearly measurable FAE 


as possible. When the test bar is inter- 
mediate in width compared to the two 
inspection bars, as in the Wertheimer studies, 
FAE will generally be larger with both-hand 
satiation than if only one hand is satiated 
while the other is used merely for estimating 
widths. 
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with one examination point, were used. The 
first 180 Ss were assigned in turn to nine 
groups, 20 Ss for each of the nine widths 
of test blocks. The next 20 Ss were assigned 
to a control group and the last 20 to a re-run 
of one of the nine experimental conditions. 

Procedure.—The Ss were tested individual- 
ly, completing the experiment in cne session. 
They were blindfolded throughout the experi- 
ment and were read instructions to the effect 
that the experiment concerned their ability 
to judge sizes or widths of blocks. They 
were told this would be done with thumb 
and big finger of each hand, one hand gripping 
a block while the preferred hand found a 
location of apparent equal width on the wedge. 
Instructions were given where necessary 
at each of the steps of procedure listed below. 

On entering the experimental room, S was 
led between the two tables. For both left- 
and right-handed Ss, the scale was presented 
to the preferred hand, the blocks to the other 
hand. With assistance from E, S was in- 
structed to grip the wedge with one hand 
and whatever block was present with the 
other hand. He was instructed to use only 
thumb and big finger of each hand, to press 
firmly, but not so hard as to hinder movement 
of hands back and forth along opposite sides 
of block and wedge. A demonstration was 
given of “zeroing in,” i.e., successively less 
and less overshooting of the point of apparent 
equality, for finding the width on the wedge 
that matched the width of the block in the 
other hand. The S was instructed to remove 
his hands from block and wedge after com- 
pleting each judgment. 

The S was warned that the entire wedge 
would be moved between each measurement 
(the wedge was moved randomly within a 
10-in. limit), and was encouraged to step 
back and forth freely during judgments. 
This was done to minimize the use of spatial 
position cues, such as the position of the arm, 
in judging. At the start of each judgment 
the slider was set alternately at the extreme 
ends of the scale. Step by step, the procedure 
was as follows: 

Step 1: Test block (4, },...2 in.). 
Eight judgments were made on the test block 
to which S had been assigned. The first 
four of these trials were practice judgments 
during which S was encouraged to complete 
the judgment in about 15 sec. and any ques- 
tions were answered. The next four trials 
constituted the measure of PSE, and thereby 
the constant error, for each S. 

Step 2: Inspection block, first satiation 
period. The S moved the thumb and big 
finger of his nonpreferred hand back and 
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forth along the sides of the 2-in. inspection 
block, pressing firmly, for a period of 1 min. 
at the rate of approximately 100 movements 
(50 double traverses of the block) per minute. 

Step 3: Test block (same block as in 
Step 1). The S immediately made four judg- 
ments of the test block to which he had been 
assigned. ‘This is the first measure of FAE. 

Step 4: 1 min. rest. The S stood with 
hands at side. 

Step 5: Same as Step 2. 
second satiation period. 

Step 6: Same as Step 3. 
second measure of FAE. 

In addition to the scale-readings, EZ 
recorded the number of moves (single 
traverses of the inspection block) made during 
the inspection period, the time to make each 
judgment, and the time between judgments 
(the latter depended on what E had to do 
between trials). 

After the nine experimental groups had 
been completed, a control group of 20 Ss, 
drawn from the same undergraduate classes 
as the experimental groups, was put through 
the steps listed above but without any 
satiation period on the 2-in. block. Instead 
of satiation (Steps 2 and 5), the control Ss 
simply rested for 1 min. The control Ss were 
tested (Steps 1, 3, and 6) on the 1-in. block. 

A final group of 20 Ss was run as an exact 
repeat of the }-in.. experimental group (the 
reason for this will appear later). 


This is the 
This is the 


RESULTS 


The terms Pretest, Test I, and 
Test II will be used to refer, respec- 
tively, to the four judgments of the 
test block width preceding the first 
satiation period (Step 1), the four 
judgments following the first satiation 
period (Step 3), and the four follow- 
ing the second satiation period (Step 
6). 

For each S, the mean of each set 
of four trials was computed and 
used as a raw score. The score on 
the Pretest is the PSE; the difference 
between it and the physical width of 
a block (physical width subtracted 
from PSE) is the constant error. 
The difference between the Pretest 
score and the score on Test I (Test I 
score subtracted from PSE) is the 
magnitude of FAE following the 
first satiation period; similarly, the 
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difference between Pretest and Test 
II score is the measure of FAE 
following the second satiation period. 
Thus, the constant error is not 
included in the measures of FAE. 
In this situation FAE appears as an 
apparent shrinking or narrowing of 
the test blocks, so that larger (posi- 
tive) differences between the Pretest 
score and the Test I or II scores 
indicate larger FAE. 

Aftereffect—The mean FAE for 
each group (each test block width) 
was computed for both Test I and 
Test II. These means are plotted as 
a function of the differences in width 
between the inspection block and 
(narrower) test blocks in Fig. 1. 
(The baseline in Fig. 1 is plotted 
in difference scores because in prin- 
ciple, magnitude of FAE should 
depend, other things equal, only on 
relative distance between inspection 
and test contours, not on absolute 
sizes of test figures. Some studies 


[Sagara & Oyama, 1957, Fig. 2] seem 


to indicate that absolute sizes are 
important, but in these cases the 
possible operation of Weber's Law 
was not considered.) By inspection, 
the curves in Fig. 1 show the inverted 
U shape that would be expected for 
distance gradient, i.e., FAE first 
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Fic. 1. Magnitude of figural aftereffect 
as a function of difference in width between 
a 2-in.-wide inspection block and narrower 
test blocks. 
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TABLE 1 


TREND ANALYSIS OF First AFTEREFFECT 
MEASURE ON ORIGINAL GROUPS 





Source 





Between groups 
Linear 
adratic 
ubic 














increases, then decreases, as test 
blocks become progressively narrower 
than the inspection block. 

Since the ?-in. group seemed out of 
line, another group of Ss was run 
to check this point. The data for 
the second 3-in. group are indicated 
by the black dots in Fig. 1. Although 
it does not appear that the first 
j-in. group represents an unusual 
departure from the gradient, in view 


of the data from the second }-in. 
group, duplicate statistical analyses 
were run, one incorporating the orig- 


inal, the other incorporating the 
second, #-in. group. 

A trend analysis based on orthogonal 
polynomials (Grant, 1956) was used to 
test the gradients in Fig. 1. There 
were four such analyses, duplicate 
analyses (for the reason noted above) 
on both Test I and Test II. For 
all four sets of data, the variances 
were heterogeneous by Bartlett's test 
(P’s < .01); because of this, only high 
levels of significance will be accepted 
(Lindquist, 1953). 

Table 1 is a summary of the 
analysis of the Test I gradient, based 
on the original groups. The only 
polynomial term that is clearly signifi- 
cant is the quadratic (the quartic 
term is ignored because it did not 
reach even the 5% level in the other 
three analyses). Since the between- 
groups term is significant at the 1% 
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level, it will also be considered a 
reliable finding. The tables for the 
other three analyses will not be 
presented; in all of them the only 
significant terms were the quadratic 
polynomial (P < .001 in each case), 
and between-groups (P < .01 in two 
cases, <.05 in the other). Thus, it 
seems reasonable to conclude that 
the means differ, and that the gra- 
dients would best be described by a 
quadratic function. 

In Fig. 1, the lowest point on each 
gradient (least FAE) is the mean 
for the }-in. group. This mean was 
compared to all other points on the 
gradient by ¢, using the modified 
degrees of freedom equation (Walker 
& Lev, 1953). On Test I, eight of the 
nine ¢’s were significant (one at only 
the 5% level); the comparison with 
2 in. was not. On Test II, six of 
the nine comparisons were significant 
(four at the 5% level); those with 
}-, 2-, and 2-in. groups were not. In 
general, the pattern of ¢’s supports 
the finding of significant kinesthetic 
FAE and of a quadratic distance 
gradient. 

The point representing identity 
of inspection and test blocks (2-in. 
group) was not significantly different 
from zero (no FAE) for either Test | 
(¢ = 1.0) or Test II (¢ = 1.5). In 
other words, FAE was numerically, 
but not significantly, different from 
zero when test and inspection blocks 
were the same size. 

The over-all difference in magnitude 
of FAE between Test I and Test II 
was highly significant; the ¢ for re- 
lated measures was 9.2. Thus, the 
second inspection period produced 
a significant increase in FAE over 
the first period. 

Controls.—Although the evidence 
for significant kinesthetic FAE and 
for a distance gradient appears clear- 
cut, it has yet to be shown whether 
the results are independent of prac- 








168 


TABLE 2 


ANALYSIS OF VARIANCE OvER TRIALS WITHIN 
TESTS ON THE 1-1IN. GROUP 














Source df MS F 

Tests (T) 2 72.77 28.99*** 
Trials (Tr) 3 1.95 1.09 
Subjects (S) 19 26.71 8.05*** 
TX TR 6 .69 
TXxS 38 2.51 2.56°* 
TrxXS 57 1.79 1.83 
TXTrxS 114 .98 

P= Ol. 

7 P = OO. 


tice effects or of constant errors (even - 


though constant errors have been 
subtracted, by the method of measure- 
ment, from the means of Fig. 1). 
To analyze practice effects, the mean 
judged size of block on each of the 
four trials on each of the three 
tests (Pretest, Test I, and Test 11) 
was computed for each group. By 
inspection, the 1-in. group appeared 
to show the greatest progressive 
change over all of these 12 trials; 
therefore an analysis for repeated 
measures (Walker & Lev, 1953) 
was done on this group. The results 
are presented in Table 2, where Tests 
indicates the three sets of four trials. 
It can be seen that neither the Trials 
nor the Tests X Trials terms is signifi- 
cant. Because of this, analyses of 
the other groups were not considered 
necessary ; it is concluded that prac- 
tice effects within experimental groups 
can be ignored. (The error terms for 
the main effects in Table 2 are those 
recommended by Walker and Lev 
[1953 ].) 

It will be recalled that a control group, 
treated the same way as the 1-in. experi- 
mental group except that rests were sub- 
stituted for inspection periods, was run later. 
Although not strictly comparable to the 
experimental groups, it seems worthwhile to 
report the data from the control group at 
this point. None of the differences among 
the control group means on the Pretest, Test 


I, and Test II was significant by #. The 
means of the Pretest for the 1-in. experimental 
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group and the control group were not signifi- 
cantly different, but the differences between 
these groups were significant both on Test I 
(t = 2.9) and Test II (¢ = 3.4). If these 
comparisons are not merely the result of 
sampling bias, they are another indication 
that the inspection periods do produce 
significant distortions in size judgments 
(FAE) which are not attributable to practice 
effects. 


To examine constant errors, the 
mean constant error was computed 
for each group (the score for each S 
was obtained by subtracting the 
physical size of the test block for 
that S from the mean of that S’s 
four Pretest trials). These means 
are shown in Table 3 (where positive 
values indicate overestimation of block 
width). It may be seen that, except 
for the somewhat low value for the 
12?-in. group (reason unknown), the 
mean constant error increased fairly 
regularly from the }-in. to the 2-in. 
block, i.e., the pattern does not follow 
the gradient in Fig. 1. A more direct 
evaluation of constant error was made 
by computing a Pearson r between 
constant error for each S and FAE 
on Test I, using all 180 Ss of the 
original experimental groups. This 
r was .17, which is significant at the 
5% level. But in this situation PSE’s 
are values which are mostly larger 
than physical block size, while FAE’s 
are in the opposite direction, i.e., 
values generally smaller than either 





TABLE 3 
MEAN ConsTANT ERROR FOR EACH 
Test Block 
wsantiss | mean ciny | aw 
—002 | 026 
014 .026 
.006 047 
.070 .079 
1 119 034 
1 | 124 | 033 
1] 163 | 032 
1 | .085 .056 
2 .165 .056 














PSE or physical block size. There- 
fore, errors of measurement in PSE’s 
are necessarily directly related to 
FAE. Thus, a direct relationship 
would be expected between either 
PSE or constant error and FAE in 
this case. Because of this, and be- 
cause of the low value of r, there is 
little reason to think that there is 
much relation between constant errors 


and FAE. 


Supplementary findings.—Although an at- 
tempt was made to force all Ss to make 100 
movements (single traverses of the inspection 
block) during a 1-min. inspection period, 
the actual range (over both inspection 
periods) was 56 to 168. The Pearson r be- 
tween number of moves during the first 
inspection period and magnitude of FAE 
following that period was only —.04. 

The FAE data were analyzed for sex 
differences. The number of men ranged from 
7 to 10 in each group. Inspection of the 
distance gradients for men and women sepa- 
rately gave little reason to suspect differences. 
The grand means for men and women were 
not significantly different on either Test I 
or Test II, but tests for homogeneity of 
variance showed that the men were signifi- 
cantly more variable on both tests (P < .001 
in each case). 

The E had instructed S to make each 
judgment in about 15 sec. For the 1-in. 
experimental group. the mean time per judg- 
ment was 16.3, 14.7, and 14.3 sec. for the 
Pretest, Test I, and Test II, respectively. A 
scatter plot of median time to make judg- 
ments and magnitude of FAE on Test | 
was prepared; by inspection, there appeared 
to be no correlation. 

The time required for E to reposition the 
slider, the scale, and S's hand determined 
the time between judgments. The mean 
time was 10.2 sec. 

Since only 17 Ss were left-handed, little 
can be said about handedness as a variable. 
However, inspection of the data from these 
17 Ss did not indicate any difference from 
right-handed Ss. 


DISCUSSION 


The two main purposes of the study 
appear to have been achieved. Kines- 
thetic FAE, independent of constant 
errors, practice effects, or both-hand 
satiation, was found, and a distance 
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gradient, for blocks smaller than the 
inspection block, was demonstrated. 
The gradient appeared to be primarily 
a quadratic function. 

The gradients reported by Japanese 
investigators (Sagara & Oyama, 1957) 
for vision showed some FAE when 
inspection and test figures were identical, 
contrary to Kohler’s prediction (Kohler, 


1951; Kohler & Dinnerstein, 1947; 
Kohler & Wallach, 1944). The gradient 
found in the present study partially 


supports the Japanese findings.  Al- 
though the FAE measure at the point 
of coincidence was only numerically, 
not significantly, different from zero, 
it is quite likely that the value would 
have been significant if more inspection 
periods or a larger N had been used. 

The kinesthetic gradient found here 
and visual gradients found by the Japa- 
nese correspond even more precisely in 
another respect. In each case the peak 
of the gradient occurred at a test figure 
one-half the size of the inspection figure, 
where size means width of block in 
kinesthesis, diameter of outline circle in 
vision. This relationship between mag- 
nitude of FAE and relative sizes of test 
and inspection figures may have some 
generality, since the Japanese have 
found it with a number of different 
absolute sizes of inspection figures. 

Both Kohler and Dinnerstein (1947), 
and Wertheimer (1954) reported relative 
permanence of kinesthetic FAE as indi 
cated by an increase in FAE following 
additional satiation given after a rest 
This finding has been verified by Wert 
heimer and Leventhal (1958) in a more 
recent study, although as noted earlier, 
both-hand satiation used. In the 
present study the second inspection 
period, separated from the first period 
by four judgments plus a 1-min. rest, 
also resulted in a significant increase in 
FAE, with the gradient maintaining its 


was 


shape. This increase suggests that the 
satiation developed during the first 
inspection period had_ suffered little 


decay by the time the second period was 
introduced. Furthermore, there was no 
evidence of a decrease in magnitude of 


FAE during the set of four judgments 
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following each inspection period, each 
set consuming a minimum of 2 min. 
In contrast, the data on visual FAE 
(Sagara & Oyama, 1957) indicate that 
after inspection periods of 1 min., 
satiation decays rapidly, i.e., magnitude 
of FAE decreases sharply in the first few 
sec. after inspection. Thus, the decay 
function of kinesthetic FAE may be of 
vastly different slope from that of visual 
FAE. The other alternative is that 
initial magnitude of FAE is relatively 
much greater, for equal inspection times, 
in kinesthesis than in vision; since a 
single measurement of kinesthetic FAE 
required about 15 sec., it is possible 
that the sharp initial decrease noted 
in vision had already taken place. 

Finally, it may be noted that the 
findings of this study and those of the 
Japanese in vision, viz., distortion of 
size judgments and a distance gradient, 
support fairly well the descriptive and 
topological (as distinguished from the 
neurological) aspects of Kohler and 
Wallach's (1944) theory of neural satia- 
tion. By topological aspects we refer to 
their claims that FAE result from the 
effect that one bounded area of stimula- 
tion has upon another nearby area, 
and that this effect varies complexly 
as a function of distance between the 
contours of the areas. The contradic- 
tion that FAE may occur when test and 
inspection contours coincide might, as 
Kohler and Dinnerstein (1947) suggest, 
be accounted for by examining the pos- 
sibilities for self-satiation in inspection 
figures alone. 


SUMMARY 


In order to study kinesthetic figural 
aftereffect, groups of Ss were first tested for 
constant error in judging width of blocks, 
then inspected a 2-in.-wide block by running 
thumb and finger of one hand along opposite 
sides of the block for 1 min. Each S then 
gripped a test block with that hand and 
estimated its width by means of a scale held 
in the other hand. A different group of Ss 
was assigned to each of nine test blocks which 
varied in width from } in. through 2 in. 
After a 1-min. rest, the inspection and test 
procedures were repeated. 
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The results were: 


1. Kinesthetic figural aftereffect in the 
form of significant underestimation of test- 
block width was found. Constant error in 
width judgments, practice, and certain 
individual difference variables were ruled 
out as explanations of the aftereffect. The 
second inspection period significantly in- 
creased the amount of aftereffect. 

2. A. significant distance gradient, of 
inverted-U shape, appeared; amount of 
aftereffect first increased, then decreased, 
as width of test block decreased. 

3. Qualitatively, aftereffects in kinesthesis 
seem to parallel those found in vision, but 
the results suggest that the two modalities 
must differ widely either on amount of after- 
effect developed in the same inspection time, 
or on rate of decay of aftereffect over rest. 
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This study investigated the con- 
straint upon words attributable to the 
length, distribution, and structure of 
context consisting of incomplete sen- 
tences. The measure of constraint 
employed was single-guess word pre- 
dictability. The positive relationship 
between length and constraint has 
already been demonstrated in the 
case of letters (Burton & Licklider, 
1955; Shannon, 1951). While there 
is no reason to believe that this 
relationship will not hold where word 
rather than letter predictability is 
employed, there may well be a dif- 
ference in the length of context 
required for maximum constraint. 
For letter prediction, this length 
has been reported to be 32 letters 
(Burton & Licklider, 1955). The 
present study afforded an opportunity 
to observe the constraint on words 
in contexts well beyond the 32-letter 
limit. 

' This paper is based upon research carried 
out while the first two authors were with the 
Air Force Personnel and Training Research 
Center and the third was principal investi- 
gator of Contract AF 41(657)-137. The 
work was done under ARDC Project No. 
7730, Task No. 17125, in support of the 
research and development program of the 
Air Force Personnel and Training Research 
Center, Lackland Air Force Base, Texas. 
Permission is granted for reproduction, 
translation, publication, use, and disposal 
in whole or in part by or for the United 
States Government. 
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The relationship between constraint 
and the distribution of context, i.e., 
whether the context totally precedes, 
follows, or is situated on both sides of 


the dependent (the word to be 
predicted), has not received much 
attention from investigators. The 


results of studies by Kaplan (1950) 
and Miller and Friedman (1957) 
indicate that bilateral distribution 
exerts greater constraint than either 
form of unilateral distribution. The 
present study attempted to confirm 
this for contexts encompassing whole 
sentences. 

Of the relationship between con- 
straint and the structure of context, 
little is known beyond Kaplan's 
(1950) finding that short contexts 
(1-4 words) consisting entirely of 
articles, prepositions, conjunctions, 
etc. (particle contexts)exert less seman- 
tic constraint than contexts contain- 
ing at least one noun, verb, adjective, 
or adverb (substantive contexts). Since 
Kaplan's characterization of contexts 
as particle or substantive is un- 
fortunately inapplicable to longer 
contexts—almost all contexts longer 
than four words would be substan- 
tive—the difficulty of classifying struc- 
tures so that the classification bears 
some relation to the degree of con- 
straint still unresolved. Thus, 
structure was investigated in the 
present study only in a very indirect 


is 
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way; namely, through the effect of 
the grammatical class of words on 
their predictability. The connection 
between the grammatical class of the 
dependent and the structure of the 
context would lie, of course, in the 
fact that the structure of the context 
determines the class to which its 
dependents belong. Although this 
relationship is attenuated by the fact 
that most contexts will admit more 
than one class of dependents, one 
class will in general be predominant; 
that is to say, the most frequent 
dependents will generally belong to 
the same class. Certain inferences 
may thus be drawn between the struc- 
ture of the context-—defined in terms 
of the class of its most frequent 
dependents—and its contribution to 
constraint. Therefore, if nouns turn 
out to be less predictable than verbs, 
it may be inferred that contexts 
having a structure such that the most 
frequent dependents are nouns exert 
less constraint than contexts in which 


the most frequent dependents are 
verbs. 
METHOD 
Materials 
About 3,000 printed English sentences 


exactly 6, 11, and 25 words long were drawn 
randomly from a number of back issues of a 
representative selection of popular magazines. 
Contractions were counted as two words, 
hyphenated words as one. Sentences punc- 
tuated with a semicolon, containing dots to 
show omission, or appearing as captions to 
pictures were excluded. Other than that, 
no restrictions were placed on the punctua- 
tion, structure, or content of the sentences 
that could be drawn. 

Word classification.—-Each word of every 
sentence in the sample was classified as 
belonging to one of six classes: noun, verb, 
adjective, adverb, pronoun, or function word. 
The class of function words includes articles, 
conjunctions, prepositions, auxiliary verbs, 
interjections, and quantity words. 

Although they bear familiar designations, 
these classes differ somewhat from the 
traditional classes in that they are based on 
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Fries’ (1952) system of analysis with two 
important modifications: First, the pronoun 
was treated as a separate class apart from the 
noun. This was advisable since it appeared 
likely that the pronoun would be found to 
differ from the noun in predictability and 
since Aborn and Rubenstein (1957) had 
already observed it to be statistically distinct. 
Secondly, all of Fries’ groups of function 
words were combined into one class—both 
for the sake of simplicity of design and because 
all are highly dependent upon context. 
Other differences between the classification 
used in the present study and Fries’ system 
have been taken up in detail in the Aborn 
and Rubenstein study (1957). 

Position of omission.—Four positions were 
selected from which a single word would be 
omitted in the experimental sentences, i.e., 
those sentences of the entire sample that were 
to be administered to Ss. Since the sentences 
were of three different lengths, two of the 
four positions of omission were comparable 
rather than identical from one length to the 
next. The positions selected were: sentence 
initial, early medial (the third word in 6-word 
sentences, the fourth word in 11-word sen- 
tences, and the eighth word in 25-word 
sentences), late medial (the fourth word in 
6-word sentences, the eighth word in 11-word 
sentences, and the seventeenth word in 25- 
word sentences), and sentence final. (Recall- 
ing that contractions were originally counted 
as two words, any contraction appearing in a 
position of omission was rewritten as two 
separate words.) 

Experimental sentences.—The entire sample 
of .6-word sentences was divided into six 
groups according to the class of the first word. 
Twenty sentences were then chosen from 
each group. The sentences were chosen at 
random, but those having a proper noun or a 
numeral in the position of omission were 
discarded in favor of some other sentence. 
(The names of persons, places, or numbers 
are nearly impossible to predict in sentence- 
length contexts and their inclusion would 
merely serve to reduce the proportion of 
correct responses.) The remainder of the 
6-word sentences were arranged and rear- 
ranged three times more, once according to 
the classification of words in the early medial 
position, once according to the classification 
of words in the late medial position, and once 
according to the classification of words in the 
final position. Each time, 20 sentences were 
chosen randomly (under the same restrictions 
as before) from each of the six word-class 
groups. There was one exception: Since 
function words occur very infrequently in the 











last position of the sentence (in printed 
English at least), sentences were chosen with 
only five classes represented in that position. 


The entire procedure for selecting the © 


experimental sentences from the 6-word 
sample was repeated for the 11- and 25-word 
samples. When the procedure was complete, 
there were 1,380 experimental sentences 
divided into three treatments of length (with 
460 sentences each), within which were four 
treatments of position (with 120 sentences 
each except in sentence final), within which 
were six treatments of word class (with 20 
sentences each) minus one class in sentence 
final at each length, giving, in all, 69 treatment 
combinations capable of testing the effects 
of any one variable under controlled conditions 
of the other two. 

Test booklets —Copies of each sentence, 
with an underlined blank space appearing in 
place of the omitted word, were mimeographed 
on separate sheets of paper. Each sheet bore 
a code number identifying the combination 
of treatments represented by the sentence. 
The sheets were assorted into stacks of 20 
and separated into groups representing the 
69 treatment combinations. The sheets 
were then collated into booklets in accordance 
with a previously prepared table of permuta- 
tions which arranged the code numbers into 
groups of 20 sets of 69, with no duplications. 
In brief, the test booklets were made up of 69 
sentences each, every booklet containing one 
sentence from each combination of treatments, 
arranged in sets of 20 so that each set 
exhausted all 1,380 sentences, with the sen- 
tences appearing in a different sequence in 


each booklet. 


Subjects 


The experimental sentences were ad- 
ministered to 24 second-semester freshmen 
at the University of Alabama.? An attempt 
was made to secure a relatively homogeneous 
group with regard to academic proficiency in 
general and high verbal ability in particular. 
Subjects were selected from students scoring 
in or very near the top decile of both the 
English placement test and the reading 
comprehension test given by the university. 
In addition, candidates for S were admin- 
istered the School and College Aptitude Test, 
and only those achieving total or verbal 
scores falling within the upper quartile of the 
national norms were chosen. 

2 This phase of the study was carried out 
under Contract AF 41(657)-137. 
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Procedure 


The Ss met as a group for twenty 1.5-hr. 
testing sessions conducted over a period of 
four weeks. Each time they met, each S 
completed one of the 20 booklets in the set 
assigned to him. The following directions 
were read to the group at the beginning of 
every session : 

“Each sentence has a word missing in it. 
The missing word is indicated by an under- 
lined blank space. One, and only one, word 
is missing from the sentence. It may 
occasionally be a hyphenated word, but it is 
never two words, it is never a contraction 
(don't, won't, shouldn't, can't, etc.), and it is 
never the name of a person, the name of a 
place, or a number. 

“Read each sentence carefully. 
about the sentence before you fill in the 
missing word. These are real sentences 
taken from popular magazines. Write in the 
word you think was most likely to have 
occurred in that sentence. 

“Don't be disconcerted if many sentences 
suggest the same missing word to you. Don't 
deliberately strive for variation among your 
guesses. Always guess what was most likely 
the word missing from the sentence at hand, 
regardless of your guesses on other sentences. 

“Be sure to complete every sentence in the 
booklet. Leave no sentence out. When you 
are finished, go back and count to see if you 
have completed all 69 sentences in the booklet. 

“The underlines indicating the missing 
word are all the same size. The length 
of the underline is absolutely no indication 
of whether the missing word was a long or a 
short word. 

“Write in your guess in any part of the 
empty page below the sentence. Don't try 
to crowd it into the blank space itself. Write 
legibly. We cannot give you credit for a 
word we cannot read.” 

To help motivate Ss, double pay was 
offered to those achieving the 10 highest 
scores. 

Scoring.—A response was scored as correct 
only if it reproduced the missing word exactly 
as it originally appeared in the sentence. 
Variations in number, tense, or person were 
scored as incorrect. 


Think 


Misspellings, however, 


were disregarded if the response was 
unambiguous. 
RESULTS AND DISCUSSION 
The differences in _ predictability 


occurring under the various treat- 
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TABLE 1 


MEAN NUMBER OF WorpDs CoRRECTLY PREDICTED IN EACH 
COMBINATION OF TREATMENTS 





























Word Class 
Position of Sentence 
Omission Length parte rer -_ 
Noun Verb Adj. Adv. F.W. Pro. 

6-word 3.79 9.00 1.46 3.04 9.04 8.58 
Initial 11-word 3.21 5.83 4.25 5.38 12.92 11.46 
25-word 3.92 5.79 4.08 3.21 11.96 11.71 
Early 6-word 4.54 6.17 4.00 7.25 13.08 8.83 
medial 11-word 6.25 8.79 5.63 4.79 11.96 13.25 
25-word 7.46 10.75 5.25 5.00 13.88 14.63 
Late 6-word 5.29 5.46 5.50 9.96 11.67 7.83 
anndied 11-word 5.63 7.88 6.08 8.46 14.46 11.46 
25-word 3.17 10.58 5.17 4.83 13.54 15.83 
6-word 5.54 3.63 | 1.46 7.46 5.96 
Final 11-word 6.33 8.29 3.29 7.17 10.71 
25-word 3.75 6.71 4.54 9.63 11.25 





























ments of word class, position of omis- 
sion, and sentence length are shown 
by the means in Table 1. These 
means were obtained by summing 
the number of correct predictions in 
each treatment combination and di- 
viding by 24, the number of Ss. In 
order to test the significance of the 
main effects directly from these data, 
each mean was taken as a single score 
representing a maximum likelihood 
estimate of an S’s performance on 20 
sentence replications. Since it was 
theoretically possible for each of 
these scores to have assumed any 
value between 0 and 20, the scores 
could be regarded as a random 
variable over a subset of the con- 
tinuous interval 0 to 20. The dis- 
tribution of means was approximately 
Poisson, and the square-root trans- 
formation was used as the most 
appropriate transform for normalizing 
the distribution for analysis of vari- 
ance (Bartlett, 1947). 

As shown in Table 1, there were 
three cells for which no scores existed : 
function word in the final position 
of the 6-, 11-, and 25-word lengths. 





In order to evaluate all of the data 
in a single analysis, the mean squares 
of the treatments and residual vari- 
ances were modified by estimating 
the missing variates by the Yates 
(1933) procedure.® 

Table 2 summarizes the results of 
the analysis performed with the data 
shown in Table 1, as modified. It 
should be noted that 3 df were lost 
by the approximation of the missing 
variates. The correct estimate of the 
within variance turns out to be the 
pooled mean square for all interactions 
because replications have been elimi- 
nated and the interactions of the 
lower order fail to yield significant F 
values (Edwards, 1950, pp. 255-256). 
The results indicate that the obtained 
differences in predictability among 
word classes were highly significant 
and that the obtained differences 


*As a check on the possible bias intro- 
duced by estimating the missing data vari- 
ance, an analysis of the scores was made with 
all values for sentence final omitted. This 
analysis yielded treatments and error mean 
squares that differed from those of Table 2 
by no more than one decimal. 
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TABLE 2 


ANALYSIS OF VARIANCE OF TRANSFORMED 
MaximuM LIKELIHOOD EsTIMATES 
FOR TREATMENT COMBINATIONS 











Source of Variation df MS F 
Word Class (C) 5 | 4.442 | 26.284* 
Position of omission 

(P) 3 467 2.763** 
Sentence length (L) 2 544 3.219** 
Cx FF 15 .203 — 
CXL 10 .206 - 
PXL 6 .164 — 
CxPxXL 27 .137 
Pooled error 58 .169 
Total 68 








Note.—The total number of df is 71. Since estimates 
of three missing values were used, however, 3 df had 
to be subtracted. 

P < .0O1. 
oP < 0S. 


among positions of omission and 
sentence lengths were acceptably sig- 
nificant. As noted above, none of 
the interactions was significant, indi- 
cating that each of the three variables 
is capable of producing differences 
in constraint independently of the 
constraining effects of the other two. 

Word class.—The results pertaining 
specifically to word class are sum- 
marized in Table 3. The data in the 
table are derived from data already 
presented in Table 1. In the case 
of Table 3, however, the number of 
correct predictions for the various 
word classes was taken without regard 
to position of omission, and the cells 
are based on 80 sentences (60 sen- 
tences for function words) rather 
than 20, as in the case of Table 1. 
To avoid confusion with the figures 
in Table 1, the entries in Table 3 
are given as percentages rather than 
means. 

Table 3 shows quite clearly that, 
beyond the matter of the reliability of 
the differences, differences in the 


predictability of words belonging to 
different word classes were of con- 
siderable magnitude. Function words, 
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for example, were on the whole about 
three times as predictable as adjec- 
tives; pronouns, about twice as pre- 
dictable as nouns. The differences 
are not only large, but appear to be 
inverse to the size of the class of the 
omitted word. We would expect 
these differences to be related to the 
relative size of the classes—reasoning 
that the larger the class of the omitted 
word, the greater the number of 
alternatives admitted by the context 
and consequently the lower the prob- 
ability of correct prediction. 

In this connection, it should be 
noted that great differences in size 
exist among the various word classes. 
French, Carter, and Koenig (1930), 
for example, found that among 2,240 
different words in a sample of 79,390 
words of telephone conversation, 46% 
were nouns, 28% were adjectives or 
adverbs, 20% were verbs (excluding 
auxiliaries), and 6% were pronouns 
and words classified as function words 
in the present study. Similarly, Fries 
(1952) found in 1,000 different words 
of telephone conversation that 39% 
were Class 1 words (nouns and pro- 
nouns), 25% Class 2 words (verbs), 
17% Class 3 words (adjectives), 12% 
Class 4 words (adverbs), and 7% 
function words. The descending 
hierarchy of classes according to size, 
then, is: noun, verb, adjective, ad- 
verb, function word, and pronoun. 


TABLE 3 


PERCENTAGE OF Worps CORRECTLY 
PREDICTED By CLASS 


Word Class 








Sentence 
Length y j 

Noun| Verb| Adj. | Adv.| F.W. | Pro. 
ema. ae OR oe ee re . oe 
6-word | 24 | 30 | 16 | 35 | 56 | 39 
1l-word | 27 | 39 | 24 | 32 | 66 | 59 


25-word | 23 | 42 | 24 | 28 | 66 | 67 
Mean % | 25 | 37 | 21 | 32 | 63 | 55 
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Comparing this with the ascending 
hierarchy according to predictability 
—adjective, noun, adverb, verb, pro- 
noun, and function word—there would 
indeed appear to be some degree of 
inverse relationship, but disturbed 
by reversals. Not too much signifi- 
cance can be placed upon the finding 
that function words are more pre- 
dictable than pronouns. Both are 
such small classes—there are about 
50 pronouns and 150 function words 
in the language—that the difference 
in size is probably too slight to be 
detected by a single-guess measure 
of predictability. The unexpectedly 
low predictability of the adjective 
and adverb, however, is a more serious 
matter. For some reason, the num- 
ber of alternatives from which Ss 
selected their responses was greater 
in the case of omitted words which 
were adjectives or adverbs than we 
would expect from the sizes of these 
classes. One possible explanation is 
that in contexts where the omitted word 
was an adjective or adverb, Ss tended 
to draw their responses from other 
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classes as well. To test this possi- 
bility the number of responses dis- 
agreeing with the class of the omitted 
word was determined (Table 4). 
While showing that the amount of 
disagreement was generally low, Table 
4 clearly indicates that the tendency 
for Ss to draw responses from a class 
other than that of the omitted word 
was most pronounced in the case 
of the adjective and adverb. In 
other words, the grammatical con- 
straint exerted by the context is 
weakest for the adjective and adverb. 
The following sentences may serve as 
examples of contexts in which the 
adjective or adverb might be replaced 
by a different word class: Good 
(adjective) or That (function word) 
cheese is expensive. The big (adjec- 
tive) or store (noun) window was 
broken. He was sick (adjective) or 
killed (verb). Quietly (adverb) or 
Discouraged (adjective) he left the 
room. They found him there (adverb) 
or working (verb). He read quickly 
(adverb) or tt (pronoun). 

The effects due to differences in 


TABLE 4 


PERCENTAGE DISAGREEMENT WITH THE WorD CLASS OF THE OMITTED WorD 


Position of 
Omission 


Sentence 
Length 


6-word 
11-word 
25-word 


Initial 


6-word 
11-word 
25-word 


Early 
medial 


6-word 

Late 11-word 
medial 25-word 
6-word 
11-word 
25-word 


Final 


Mean % 


Word Class 


Adj. Adv, 
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grammatical constraint were elimi- 
nated by analyzing the data of only 
those sentences where Ss all drew 
their responses from the class of the 
omitted word. The following means 
over all sentence lengths and over all 
positions but initial (there were too 
few such sentences for adjectives and 
adverbs in initial position) were then 
obtained : nouns 30%, adjectives 30%, 
verbs 38%, adverbs 46%, pronouns 
59%, and function words 69%. Ap- 
parently, the low predictability of 
the adverb is due to the lower gram- 
matical constraint exerted on it. It 
is equally clear from these figures 
that the low predictability of the 
adjective, on the other hand, involves 
something besides class size and 
grammatical constraint. It may be 
that adjective alternatives have a 
greater tendency toward equiproba- 
bility or, possibly, that authors tend 
toward a greater avoidance of clichés 
in the use of adjectives than in the 
use of other word classes. At any 
rate, the data of the present experi- 
ment do not permit a convincing 
answer one way or the other. 

Position of omission.—The results 
pertaining specifically to position of 
omission are summarized in Table 5. 
As in Table 3, the data in Table 5 are 
derived from data already presented 


TABLE 5 


PERCENTAGE OF Worps CORRECTLY 
PREDICTED BY POSITION 


Position of Omission 


ies Mean 
t 
Final 

dial 


6-word 
11-word | 
25-word | 


34 
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in Table 1. In Table 5, however, 
the number of correct predictions 
for each position of omission was 
taken without regard to word class, 
and the cells are based on 120 sen- 
tences (100 sentences for sentence 
final) rather than 20, as in the case of 
Table 1. To avoid confusion with the 
figures in Table 1, the entries in 
Table 5 are given as percentages 
rather than means. 

Table 5 shows that words in the 
medial positions of omission were 
more predictable than words in the 
initial or final position. It is evident 
that a bilateral context exerts greater 
constraint than a unilateral context 
of the same length—tregardless of 
whether the context precedes or 
follows. Similar findings have been 
reported by Miller and Friedman 
(1957) and by Kaplan (1950). Kap- 
lan found that a bilateral context 
consisting of one word on each side 
of the dependent exerted greater 
constraint than either two words 
preceding or two words following. 
The present study indicates that a 
bilaterally distributed context exerts 
greater constraint in much _ longer 
sequences as well. It would seem, 
therefore, that the proximity of seg- 
ments of context to the dependent 
is a more powerful factor in constraint 
than the length of context per se. 
Table 5 shows, for example, that a 
bilateral context of three words on 
one side and two words on the other 
exerts constraint equal to that exerted 
by a unilateral context of 10-24 words 
either preceding or the 
dependent. 

Sentence length—-The data in 
Table 5 indicate that predictability 
increases with sentence length regard- 


following 


This is in 
accord with the observation of infor- 
mation theorists that 


less of sentence position. 


constraint in- 
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creases with the increase in context 
(Shannon, 1951). The data of the 
present experiment, however, confirm 
another observation too: namely, 
that the relationship between con- 
straint and length of context does not 
go on indefinitely. There is a point 
beyond which increasing context 
will not further increase constraint. 
Thus, Table 5 shows no appreciable 
difference in predictability between 
11- and 25-word sentences. The im- 
plication is that the effect of context 
attains its maximum somewhere be- 
tween a length of 5 words and one of 
10 words. It is interesting to ob- 
serve that this interval—between 23 
and 45 letters, if one takes the average 
word length as 4.5 letters— includes 
the limit of 32 letters suggested by 
Burton and Licklider (1955). Ap- 
parently, then, the amount of context 
required for maximum constraint is 
roughly of the same order for letter 
and word prediction—at least when 
the words are predicted within sen- 
tences of fixed length. 

Predictability when word class is 
uncontrolled.—As an estimate of how 
predictability might vary from one 
position of omission to the next 
in a sample of sentences in which the 
frequency of word classes in each 
position was uncontrolled, the pre- 
diction scores were weighted according 
to the probability of occurrence of 
each word class (Aborn & Rubenstein, 
1957) at the position in question. 
More specifically, the weighting was 
carried out as follows: Each entry in 
Table 1 was multiplied by the proba- 
bility of the class in the given condi- 
tion. The sum of these products for 
each position and sentence length 
(the mean number of correct responses 
per S in 20 sentences) was then 
divided by 20 and multiplied by 100 
to yield the percentage of correct 
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TABLE 6 


PERCENTAGE OF Worps CORRECTLY 
PREDICTED BY POSITION 
AFTER WEIGHTING 








Position of Omission 


Bent 


Length 





Early 
Medial 


6-word 41 
11-word 46 
25-word 53 
Mean % 

















predictions in the total number of 
responses. 

It is apparent from the data pre- 
sented in Table 6 that, when the 
frequencies of the word classes are 
left uncontrolled, all positions except 
the final show about the same pre- 
dictability. These results are most 
easily explained in terms of the 
relative frequency of the highly pre- 
dictable pronoun and function word. 


The low predictability of the final 
position is due to the almost complete 
absence of function words and the 
low probability of occurrence of 


pronouns (only about .05-.08). The 
equal probability of the initial and 
medial positions—despite the advan- 
tage of bilateral context in the case of 
medials—results from the fact that 
function words and pronouns have a 
combined probability of .7 in initial 
position but only .45 in medial 
positions. 

These data permit a rough compari- 
son between the predictability of 
letters omitted randomly and the 
predictability of letters constituting 
aword. Miller and Friedman (1957) 
tested the ability of Ss to reconstruct 
passages originally 300 letters in 
length whea various percentages of 
the letters were randomly deleted 
(omitted with the position of omission 
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indicated), abbreviated (omitted with- 
out indication of omission), or re- 
placed by other letters. We may 
regard the omission of words in the 
present study as an abbreviation of 
- letter sequences—abbreviation rather 
than deletion since position of omission 
is indicated only for the first letter. 
If the mean word length is taken to 
be 4.5 letters, the omission of one 
word in an 11-word sentence is 
roughly a 9% abbreviation while the 
omission of one word in a 6-word 
sentence is roughly a 17% abbrevia- 
tion. With 9% random abbrevia- 
tion, Miller and Friedman found that 
Ss could correctly predict about 98% 
of the missing letters; and with 17% 
random abbreviation, that Ss could 
correctly predict about 80% of the 
missing letters. According to the 


results of the present study, Ss could 
correctly predict about 47% of the 
missing words in 11-word sentences 
and about 38% of the missing words 


in 6-word sentences (see Table 6). 
To employ these figures as indices 
of letter predictability, it was neces- 
sary to incorporate the percentage 
of letters correctly predicted even 
when the total word was scored as 
incorrect. Examination of the data 
revealed that 7% of letters (averaged 
over all positions in the word) were 
correctly predicted under this cir- 
cumstance. In all, then, the per- 
centage of letters correctly predicted 
was 54% in 11-word sentences and 
45% in 6-word sentences. The pre- 
dictability of letters constituting a 
word is therefore little more than 
half as great as the predictability of 
letters omitted randomly. 


SUMMARY 


In order to test the constraint exerted upon 
words in sentexces by three properties of 
context, 1,380 sentences were selected from 
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about 3,000 sentences drawn randomly from 
a number of popular magazines. One word 
was omitted from each sentence in a way that 
yielded three treatments of sentence length, 
within which were four treatments of position 
of omission, within which were six treatments 
of word class of the omitted word. The 
sentences were administered to 24 Ss, who 
were instructed to predict the missing word 
in each sentence with a single guess. Analysis 
of the predictability scores showed that: 

1. The length, distribution, and gram- 
matical structure of context are all inde- 
pendently effective sources of constraint on 
words in sentences. 

2. The predictability of words belonging 
to a given word class is, in general, inversely 
related to the size of that class. However, 
this relationship is disturbed in instances 
where the context exerts so little grammatical 
constraint that more than one 
occur with near-equal probability. 

3. Increasing the context beyond 10 
words does not increase predictability. The 
length at which context attains maximum 
effectiveness lies between 5 and 10 words. 

4. A bilaterally distributed context exerts 
greater constraint than a totally preceding 
or totally following context of the same 
length. This generalization holds true for 
long as well as short contexts. 

5. When the frequency of word-class 
occurrence is left uncontrolled, words have 
almost the same predictability in all positions 
in the sentence except the 
predictability is much lower. 

6. The predictability of letters constitut- 
ing a word is only about half as great as the 
reported predictability of letters omitted 
randomly. 


class may 


final, where 
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OF SUBSEQUENT GENERALIZATION ALONG 
ANOTHER DIMENSION ! 
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Kent State University 


It is becoming increasingly evident 
that the slopes of gradients of stimu- 
lus generalization are affected by a 
number of variables. Since it ap- 
pears that stimulus generalization is 
being used increasingly as an explana- 
tory concept, an increase in our very 
limited knowledge of the effect of such 
variables is needed. The experiment 
reported below is designed to shed 
further light on a variable previously 
investigated by Reinhold and Perkins 
(1955), the effect of discrimination 
training along one stimulus dimension 
upon generalization within a second 
dimension. 

In the Reinhold and Perkins study, 
rats in the experimental group were 
trained to discriminate one runway 
surface from another (e.g., rough = 
S°,smooth = S$), while control groups 
received continuous or intermittent 
reinforcement under S” and no presen- 
tationsof S*. Subsequently, a change 
in runway color produced a greater 
generalization decrement for the ex- 
perimental group than for either 
control group. Since nondifferential 
conditioning may be considered the 
limiting case of an easy discrimina- 
tion, one might take these results as 
indicating that the more difficult the 
discrimination along the dimension 
on which S® and S4 fall (Dimension 1) 
the steeper the generalization gradient 
along a second dimension (Dimension 


! This study was supported by a grant from 
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? Now at the University of Colorado. 

3 Now at the State University of lowa. 


Il). In the present study, designed 
to test this extension of Reinhold and 
Perkins’ findings, two groups of rats 
received extensive differential condi- 
tioning of bar pressing where S” was 
a high level of illumination (H). For 
one group, S* was a medium level of 
illumination (M), and for the other 
group, S* was a low level of illumina- 
tion (L). The two groups received 
identical training except that one S4 
was more like S” than was the other, 
and thus the discrimination was more 
difficult for Group D than for Group 
E. All Ss were then tested with S” 
and with SX, a novel stimulus condi- 
tion which was the same as S? except 
for the addition (or subtraction) of 
auditory stimulation. 

The present study serves also as (a) 
a control for the possible dependence 
of Reinhold and Perkins’ results 
on an artifact from the acquisition 
of observing and (b) a 
test of the generality of their find- 


responses 


ings, since a different response and 


procedure are used. 


METHOD 
Apparatus 


A modified Skinner box, the interior of 
which was a 12-in. cube, was used. The walls 
were painted flat white except for an 8  8-in 
flashed glass stimulus window in the center 
of the roof and a small observation-ventila- 
tion window on the wall opposite the bar 
slot. The bar slot was in the middle of one 
side. The food delivery tube and food dish 
were in the corner to the left of the bar slot 
and a dish containing water was in the corner 
to the right. 
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Three levels of illumination were used. 
The highest illumination (H) was provided 
by a 40-w. bulb 2.75 in. above the ground- 
glass window. A 7.5-w. bulb 5.25 in. above 
the window provided light under Condition 
M. These light scources were contained ina 
wooden box the floor of which was the stimu- 
lus window. Under Condition L, the only 
illumination was from a flashlight bulb out- 
side the apparatus about 6 in. above and 2 in. 
behind the food delivery tube. The small 
amount of light from this source which 
reached the interior of the apparatus came 
primarily through the bar slot and food 
delivery tube. Auditory stimulus conditions 
were (a) the absence of any sound except 
noises resulting from S’s activity and minor 
extraneous sounds from outside the sound- 
dampened room, and (b) the sound of a small 
electric buzzer placed on the table about a 
foot from the box. The behavior of Ss 
indicated that the buzzer was clearly audible 
and not markedly aversive. 

Food pellets (45 mg., supplied by the 
P. J. Noyes Co. of Lancaster, N. H.) were 
delivered by an Anger pellet dispenser, 
activation of which produces a rather loud 
click quite different (at least to human 
observers) from the click following nonre- 
inforced bar presses. 

Programming was by automatic controls 
located in a room adjoining the Skinner box. 
The click of relays at each bar depression 
was clearly audible to human observers in 
the room containing the Skinner box, but 
there was no evidence of other cues from the 
automatic controls. A continuous record of 
performance was obtained on a Gerbrands 
cumulative recorder and number of responses 
during specific test periods was also obtained 
from an impulse counter activated by each 
bar press. 


Subjects 


Sixteen 6-9-month-old naive female albino 
rats from the Kent State University colony 
served as Ss. 


Procedure 


The treatment of the two groups differed 
only in that S* was Cond. M for Group D 
and Cond. L for Group E. Condition H was 
S° for all Ss. 

Preliminary training.—All Ss were first 
gradually reduced to running weight by 
limited daily feedings immediately after the 
time of day they were to be run. Half the Ss 
in each group were run under 85% body 
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weight and half under 80%. All subsequent 
treatment is outlined in Table 1. On Day 1 
of preliminary training S received 30 click- 
food sequences with the bar withdrawn and 
then, upon insertion of the bar, 10 reinforced 
bar presses. About the last dozen click-food 
presentations were used to reinforce approach 
to the bar slot. The procedure outlined for 
this day was not rigidly standardized. In 
a few instances when learning was particu- 
larly slow, it was completed during a second 
session 23 hr. after the first. There was no 
evidence of a relation between variations 
in procedure in preliminary training and S’s 
behavior during training or test peripds. 
As outlined in Table 1, the reinforcement 
schedule was gradually shifted from con- 
tinuous reinforcement to 1/12 varied ratio 
reinforcement on Days 2 and 3 of preliminary 
training. 

Training.—As shown in Table 1, the 
training period consisted of differential con- 
ditioning with varied ratio reinforcement 
of the response to S? and nonreinforcement 
of the response to S4. Throughout training 
(except for the warm-up period on the first 
training day) presentations of S” lasted 2 
min. after the first reinforcement and were 
reinforced on approximately a 1/12 varied 
ratio schedule. Slight unsystematic devia- 
tions from this ratio occurred because re- 
sponses to the negative stimulus moved the 
programming tape. The presentations of S4 
during training were for a period of 5 min. plus 
whatever time was required for the stimulus 
change to come at least 30 sec. after the last 
response. The 30-sec. period of no response 
was required for presentations of S” because, 
for reasons given by Dinsmoor (1950), such 
a procedure should facilitate acquisition of a 
discrimination. 

Test days.—Each of the four test days was 
divided into two phases. The initial phase 
was a continuation of the differential training. 
The second phase consisted of four cycles of 
the following three treatments: (a) S4, no 
reinforcement: 5 min.; (6) S®, 1/12 varied 
ratio reinforcement: 60 sec. plus the time 
required for the next reinforcement to be 
programmed; (c) S° or SY, no reinforcement: 
60 sec. The test trials (¢ above) started 
approximately 2 sec. after the last reinforce- 
ment in 6; S? always prevailed during this 
2-sec. interval during which S was consuming 
the food pellet. 

For half the Ss in each group the buzzer 
was on only during presentation of SN, 
While the other half of the Ss were in the box, 
the buzzer was silent only during SN. The 
sequence of test trials was SNSPSPSN on Test 
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TABLE 1 


OUTLINE OF EXPERIMENTAL PROCEDURE 








} nw om A Treatment Duration 





Preliminary click-food 30 presentations 
training 1 5 reinforced bar pressing 10 responses 





reinforced bar pressing 5 min. 
1/2 VR reinforcement 5 min. 
1/3 VR reinforcement 5 min. 
1/4 VR reinforcement 5 min. 
1/6 VR reinforcement 15 min. 


Preliminary 
training 2 





1/4 VR reinforcement 10 min. 
1/6 VR reinforcement 10 min. 
1/12 VR reinforcement 10 min. 





Preliminary 
training 3 





1/6 VR reinforcement 5 min. 

1/12 VR reinforcement 10 min. 

0 reinforcement 5 min. plus 30 sec. 
no response 
1/12 VR reinforcement 2 min. 

0 reinforcement 5 min. plus 30 sec. 
no response 
1/12 VR reinforcement 2 min. 





0 reinforcement | 5 min. plus 30 sec. 


no response 
1/12 VR reinforcement, 2 min. 





Repeat for total of 6 cycles 


0 reinforcement | 5 min. plus 30 sec. 
no response 
1/12 VR reinforcement 2 min. 


Repeat Phase 1 for total of 2 cycles 





0 reinforcement 5 min. 
1/12 VR reinforcement 1 min. plus time for next 
reinf. to be pro- 


grammed 
|S? or S¥ 0 reinforcement 1 min. 





Repeat Phase 2 for total of 4 cycles 
= f # - ¥ : ' 


Days 1 and 4 and SPSNSNS? on Days 2 and fied periods could be obtained. 


How- 
3 for half the Ss. The opposite sequence was ever, the records show that a clear dis- 
used for the others. Complete counter- 


balancing of sequence of test trials and buzzer ©T!™Mnation had developed by Train- 
condition was provided within each group ng Days 3-5, and .that it tended to 
with a total of 4 Ss. appear a bit earlier in Group E than 
inGroup D. By the last training day 
all Ss responded to S” at a rapid rate 

Since our primary interest was in (except for brief pauses after rein- 
the results on test trials, performance forcement), and to S* at a much 
during training was not recorded lower rate (except for rare bursts of 
in such a way that an exact count responses). The discrimination was 
of number of responses during speci- not usually as well developed on the 
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t one or two cycles of the day. 
This suggests that there was some 
warm-up effect. 

On the last four cycles of the last 
training day, Ss responded to S? 
at an over-all rate of from about 25 
responses per min., the lowest over-all 
rate for any S during any 2-min. S” 
presentation, to about 120 responses 
per min. The rate ran somewhat 
higher for Group E (30 to 120 re- 
sponses per min.) than for Group D 
(25 to 70 responses per min.), but 
there was enough intergroup overlap 
so that it seems safe to conclude that 
if exact measures had been obtained 
no significant difference between the 
groups in rate of response to S? 
would have been found. There did 
not seem to be any systematic dif- 
ference between the groups in rate 
of responding toS*. This rate ranged 
from one or two responses up to about 
40 during the 5-min. period on the last 
four cycles of the last training day. 
These results are in line with those 
of Frick (1948) who obtained the same 
rate of responding to S4 at the end of 
training when S* was 25%, 10%, 
or .1% of the illumination level of 
S», 

The results of the test trials are 
summarized in Table 2 where the 
means and SD’s of the number of 
test responses to S? and S‘ made by 


TABLE 2 


MEAN AND SD or THE NUMBER OF RESPONSES 
To Eacu Test STIMULUS FOR 
Eacu Group 


Group |- 


74.9| 53.5 
21.4} 15.6 


= | 382.0) 152.9 | 206.6| 86.4 
ond 86.4 | 108.6 | 60.8 
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Ss in each group are shown for the 
first test day (top two rows) and for 
all four test days (bottom two rows). 
Examination of Table 2 indicates 
that: (a) both groups made very 
nearly the same number of responses 
to SP (eg., 115.1 and 111.6 for 
Groups E and D, respectively, on 
Test Day 1), (0) a generalization 
decrement was obtained for both 
Groups (values in Column 3 are 
smaller than corresponding values in 
Column 1), and (c) Ss in Group E 
made many more responses to SN 
than did those in Group D. 

On Test Day 1 and on all four test 
days combined, every S in each group 
made fewer test responses to SN 
than to S?. Thus, the generalization 
decrements for both groups differ 
significantly from zero (P < .01, ex- 
pansion of the binomial used as a 
two-tailed test). As measured by 
the U test (Mann & Whitney, 1947), 


the number of responses to test 


presentations of S4 was reliably greater 


for Group E on Test Day 1 (U = 9, 
P = 014, two-tailed test) and on all 
test days combined (U = 13, P = .05). 

Two indices of the size of the gen- 
eralization decrement were used. The 
first, SN/S?, the number of responses 
on S% test trials divided by the 
number of responses on S? test trials, 
gives the decrement relative to the 
level of responding under S?. The 
absolute decrement is indicated by 
S? — S*, the number of responses on 
S? test trials minus the number on 
S® test trials. 

The groups differed significantly 
according to the former index for 
Test Day 1 (U = 3, P = .002) and 
all four test days (U = 9, P = .014). 
The absolute decrements for the two 
groups differed significantly on Test 
Day 1 (U = 3.5, P = .002), but fell 
short of significance for all four test 
days (U = 16, P = .104). 
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DISCUSSION 


Our results clearly support the hy- 
pothesis that the experiment was de- 
signed to test: that there is an increase 
in the slope of generalization gradients 
when there is a decrease in the difference 
between S” and S¢ in the discrimination 
training. They also support the hy- 
pothesis that there is a positive relation 
between the slope of the discrimination 
gradient along the dimension on which 
SP and S®4 fall (Dimension I) and the 
generalization gradient along Dimension 
II. This hypothesis, though based on 
relatively little evidence, seems more 
fruitful than the assumption that diffi- 
culty of the discrimination as such 
determines the slope of the generaliza- 
tion gradient along the second dimension. 
The hypothesized relationship between 
the two gradients has a number of 
testable implications, some of which will 
be briefly summarized. 

Since a warm-up period apparently 
does increase the slope of the discrimina- 
tion gradient along Dimension I, the 
above hypothesis implies that it should 
also increase the slope of the generaliza- 
tion gradient along Dimension II. A 
second implication of this hypothesis 
is that the slope of the gradient along 
Dimension II would be steeper following 
just enough trials to effect learning of an 
easy discrimination than following the 
same number of trials on a discrimination 
which was considerably more difficult 
and which had not yet been learned. 

An analysis of the relationship between 
stimulus generalization and _ response 
differentiation (e.g., Skinner, 1938) sug- 
gests further testable implications. Such 
an analysis is difficult to state clearly 
because of the semantic muddle resulting 
from variations in the use of the terms 
stimulus and response. Thus, we will 
attempt only to suggest a line of thought, 
not to present a precise analysis. 

In trial-and-error learning or operant 
conditioning (e.g., bar pressing) the 
class of responses which is followed by 
reinforcement includes a_ variety of 
movements. Any one of these move- 
ments will be reinforced when it occurs 
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to certain stimulus patterns and not to 
others. That is, response classes which 
are defined in terms of their effect on 
the environment, here designated as Rg 
(e.g., depression of a lever sufficiently 
to make an electrical contact), would 
appear to consist of a complex set of 
Sc — Ry connections where S© repre- 
sents a stimulus complex (e.g., sight 
of the bar slot, pressure on the paws, 
proprioceptive cues, etc.) and Ry a 
rather specific movement pattern (e.g., 
forward shifting of weight to raised 
forepaws). Reinforcement of certain 
Sc — Ry sequences will presumably 
strengthen the tendency for this same 
Ry to occur to similar but inappropriate 
Sc’s, i.e., to stimulus complexes in which 
this Ry, will not be effective in producing 
reinforcement. Thus, a given move- 
ment pattern (Ry) will be reinforced 
to certain stimulus complexes and not 
to others. This, of course, is another 
way of saying that differential condition- 
ing is involved. 

The discriminations involved in simple 
trial-and-error learning are complex and 
thus the difficulty of these discrimina- 
tions cannot be specified precisely. 
Nevertheless, some testable hypotheses 
can be deduced. For example, the 
greater the precision or skill of the 
response class followed by reinforcement 
the more difficult the discriminations 
learned and the steeper the generaliza- 
tion gradient when, for example, level 
of illumination is Thus, if 
reinforcement were contingent upon 
S’s stepping on a small treadle when 
pressing the bar rather than just pressing 
the bar, the generalization decrement 
would be greater with introduction of a 
novel stimulus. 


altered. 


SUMMARY 


An experiment was conducted to deter- 
mine whether the degree of similarity of 
positive and negative stimuli during differen- 
tial operant conditioning would affect the 
size of the generalization decrement (slope 
of the generalization gradient) when the posi- 
tive stimulus was subsequently changed 
along another dimension. 
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Two groups of eight white rats each were 
given extensive training during which a bar- 
pressing response was reinforced on a 1/12 
varied ratio schedule to one level of illumina- 
tion, S?, and never reinforced to a different 
level of illumination, S4. Both Group D, 
for which positive and negative stimuli 
were similar, and Group E, for which the 
negative stimulus differed more from the 
positive, subsequently received identical 
test presentations of the positive stimulus, 
S”, and of a novel auditory stimulus in an 
otherwise identical situation, S. 

A generalization decrement was obtained 
for every S. The two groups made approxi- 
mately the same mean number of responses 
to S? but Group E made significantly more 
responses to S, Indices of absolute and 
relative generalization decrements used were 
SP — S® and S°/SN, respectively, where S° 
and S" indicate number of responses to 
positive and novel test stimuli. According 
to the U test, relative and absolute generaliza- 
tion decrements were both significantly 
greater for Group D than for Group E. 

It is concluded that under the conditions 
of this experiment, training which increases 
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the slope of the discrimination gradient along 
the dimension on which S° and S4 fall also 
increases the slope of the generalization 
gradient on which S? and S* fall. 
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Using a conception of meaning as an 
implicit mediating response, Mowrer 
(1954) has suggested that a sentence 
is a conditioning device and that 
communication takes place when the 
meaning response which has been 
elicited by the predicate is conditioned 
to the subject of the sentence. The 
results of several recent studies have 
shown that word meaning will indeed 
condition. Meaning has been condi- 
tioned to nonsense syllables (Staats 
& Staats, 1957), national and proper 
names (Staats & Staats, 1958a), and 
meaningful words (Staats & Staats, 
1958b), the semantic differential of 
Osgood and Suci (1955) being used to 


measure meaning. 


In each of these studies a meaning 
response has been conditioned directly 


to the verbal stimuli themselves. 
Mowrer, however, hypothesized fur- 
ther that the meaning of the predicate 
is conditioned to the stimuli produced 
by the meaning response elicited by the 
subject, in addition to being directly 
conditioned to the subject itself. 
It is in this way that communication, 
which takes place on a language level, 
nevertheless changes the behavior of 
the listener with respect to the object 
itself. Using Mowrer’sexample, when 
the sentence “Tom is.a thief’ is 
spoken, the meaning of “thief” is 
conditioned both to the word ‘“Tom”’ 
and to the meaning response elicited 
by “Tom.” The meaning elicited by 

1 This study is part of a research project 
on verbal behavior sponsored by the Office 
of Naval Research under Contract Nonr 2305 
(00) with Arizona State University. The 


authors wish to thank Larry P. Nims for 
help in data collection. 


“Tom” is, however, part of the same 
response which Tom himself elicits. 
Thus, Tom, when later present, 
elicits a characteristic response in the 
former listener which now in turn 
elicits the conditioned meaning of 
“thief.” The person's behavior to- 
ward Tom will have changed ac- 
cordingly, the change being mediated 
by the conditioned meaning of ‘‘thief.”’ 

Mowrer’s hypothesis on communi- 
cation demands supporting evidence 
that a word-meaning response can be 
conditioned to a word-meaning re- 
sponse. This has yet to be shown. 
However, there are a number of 
studies which indicate that various 
types of reponses (e.g., GSR) can be 
conditioned to word meaning. This 
has been shown, as_ summarized 
elsewhere (Cofer & Foley, 1942; 
Osgood, 1953), by the fact that a 
response conditioned to a word will 
generalize, for example, to the syn- 
onym of the word-—the generalization 
presumably mediated by the common 
meaning. 


The present study tests the hypothesis 
that a word-meaning response can be 
conditioned to the meaning response of 
another word. The procedure used for 
language conditioning in the earlier 
studies (Staats & Staats, 1958a, 1958b; 
Staats & Staats, 1957) and the general 
paradigm for semantic generalization 
provide the experimental method. Fig- 
ure 1 shows the schematization of a 
procedure for conditioning a meaning 
response to a meaning response and for 
measuring the conditioned meaning. 
The word ROCK is presented visually 
prior to the auditorily presented word 
BEAUTY. Both words elicit meaning 
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responses (RM, and RM;), which also 
produce characteristic stimuli (response- 
produced cues—SM, and SM,). BEAU- 
TY (and in the later trials, FAMILY 
and SWEET) actually elicits a total 
meaning response composed of two 
components, positive evaluative mean- 
ing, RM:,, and the other distinctive 
responses that characterize the word 
meaning, Rg. Since SM, and Rpg and 
RM: are contiguous, associations are 
formed between SM, and each of the 
two responses. In the following presen- 
tations of ROCK with FAMILY and 
SWEET the association between SM, 
and RM, is further strengthened since 
FAMILY and SWEET also elicit a posi- 
tive evaluative meaning component. 
The strengthened association is repre- 
sented by the increase in heaviness of 
the line connecting SM, and RMz2. 
associations of SM, to Rg, Re, and Rs 
are not strengthened since they occur 
only once and are followed by other 
associations which are inhibitory. This 
would also be the case with the associa- 
tions involving the word stimuli them- 
selves (not depicted on the figure). If 
the hypothesized association between 
SM, and RM; is actually formed, then 
the word STONE should also elicit the 
evaluative meaning component, as de- 
picted at the bottom of Fig. 1, since 


ROC MAT Re 
AM, = SM, = = —— — — — — — DAnsersit, 











STONE qe aa Summ nom sy, me Re ering or 


“PLEASANT” 
Fic. 1. Conditioning a positive evalua- 
tive meaning response to the meaning of the 
word ROCK, and the action of the resulting 
chain of meaning responses in mediating a 
“pleasant” rating of STONE on an evalua- 
tive semantic differential scale. 


The. 
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STONE elicits the same meaning RM,, 
as ROCK, RM, produces SM, and so 
on. Measurement of the evaluative 
meaning of STONE could be made on an 
appropriate semantic differential scale 
to test the hypothesis. 

Thus, the experimental hypothesis 
involves a two-stage link of mediating 
meaning responses between the stimulus 
(synonym of the word which is condi- 
tioned) and overt rating response, i.e., 
an S——RM,-SM, RM:-SM:——R 


chain. 





METHOD 


Subjects.—The Ss were 163 students from 
classes in elementary psychology and classes 
in elementary education at Arizona State 
University, randomly selected and run in 
groups of from 15 to 20. Five groups of 
Ss were run under each of the two different 
experimental conditions. Groups were ran- 
domly assigned to one of the experimental 
conditions. Participation in the experiment 
fulfilled a course requirement. 

Procedure-—The language 
procedure previously employed (Staats & 
Staats, 1958a, 1958b; Staats & Staats, 
1957) was used. Two types of stimuli were 
used: words presented by slide projection on 
a screen (CS words), and words presented 
orally by the E (UCS words). The Ss re- 
peated each UCS word aloud immediately 
after E pronounced it. Ostensibly the task 
was to learn separately the words presented 
in the two different ways. 

Two tasks were first presented to train 
Ss in the procedure and to orient them 
properly for the main phase of the experiment. 
The first task was to learn five visually 
presented words, each shown four .times, in 
random order. Learning was tested by recall. 
The second task was to learn 33 orally 
presented words, with learning tested by 
recognition. 

The Ss were then told that the primary 
purpose of the experiment was to study “how 
both of these types of learning take place 
together—the effect that one has upon the 
other.” Six new words were used for visual 
presentation: MARK, CARPET, TIE, 
GADGET, ROCK, and BELT. CARPET 
and ROCK were the CS's to be conditioned. 
These six words were shown individually 
on the screen, exposed for 5 sec. About 1 sec. 
after a word appeared on the screen, E pro- 
nounced the word with which it was paired. 
The intervals between exposures were less 


conditioning 











than 1 sec. The Ss were told that they could 
learn the visually presented words by just 
looking at them, but that they should 
simultaneously concentrate on pronouncing 
the orally presented words aloud and to 
themselves. 

Each of the visually presented words was 
given 14 conditioning trials. Order of the 
words was randomized, except that no word 
appeared more than twice in succession, 
in order that no systematic associations 
should be formed among them. On each 
presentation of a given CS word, it was paired 
with a different UCS word; thus, 84 different 
UCS words were used. The CS words 
CARPET and ROCK were always paired 
with UCS words having evaluative meaning, 
while the other four CS words were paired 
with words having no systematic meaning. 
For one experimental condition (Group 1), 
CARPET was paired with words which had 
positive evaluative meaning, e.g., BEAUTY, 
FAMILY, SWEET, while ROCK was 
paired with words which had negative evalua- 
tive meaning, e.g., BITTER, UGLY, CRIM- 
INAL. For Group II, the order of CARPET 
and ROCK was reversed, CARPET being 
paired with the negative words and ROCK 
with the positive words. Most of the 
systematically meaningful words were taken 
from Osgood and Suci (1955) or Jenkins, 
Russell, and Suci (1957). 

When the conditioning phase was com- 
pleted, Ss rated the evaluative meaning of 
MARK, RUG, TIE, GADGET, STONE, 
and BELT on a semantic differential scale 
of pleasant-unpleasant. RUG and STONE 
are synonyms of CARPET and ROCK, the 
words to which meanings were conditioned 
in the previous phase of the experiment. 
Extreme pleasantness was indicated by a 
rating of 1 (left side of scale) and extreme 
unpleasantness by 7. The Ss were then 
asked to state anything they had thought 
of concerning the experiment, especially its 
purpose. They were also asked to state 
anything they might have heard about the 
experiment. 


RESULTS 


None of the Ss indicated awareness 
of the synonym relationship between 
the two words rated on the semantic 
differential and the two presented 
in the conditioning phase of the 
experiment. While two Ss were 
aware that certain “pleasant” or 


LANGUAGE CONDITIONING 189 


“unpleasant” words had been sys- 
tematically paired with CARPET 
or ROCK, neither saw any relation- 
ship between this fact and the meas- 
urement of the meaning of their 
synonyms. None of the Ss reported 
having heard anything of relevance 
about the experiment. Three Ss 
were randomly eliminated from the 
data, however, in order to preserve 
equal N’s in the counterbalanced 
design. The resulting N was 160. 

Before combining the data of the 
five subgroups run under each condi- 
tion, the data were analyzed to test 
the randomness of the groups. Dif- 
ferences among the means of the 
meaning scores for RUG and STONE 
were tested both for the five sub- 
groups of Group | and for the five 
subgroups of Group II. In addition, 
the differences in amount of condition- 
ing among the five subgroups of 
Group I and Group II were tested by 
comparing mean “‘difference’’ scores. 
These difference scores were obtained 
by subtracting each S’s score on the 
synonym of the positively conditioned 
word from his score on the synonym 
of the negatively conditioned word. 
Analyses of variance indicated that 
none of the differences among the 
subgroups of Group | or Group II 
was significant. 


TABLE 1 


MEANS AND SD's oF SEMANTIC 
DIFFERENTIAL SCORES OF 
THE SYNONYM Worps 





Words 
Group RUG STONE 
Mean SD | Mean SD 
I 2.72 | 141 | 4.01 | 1.68 
TT 3.25 | 1.50 | 348 | 1.82 


! 


Note.—The words were rated on a 7-point scale with 
Pleasant as 1 and Unpleasant as 7. 
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TABLE 2 


ANALYSIS OF VARIANCE 





























Source df MS PF 
Between Ss 
Groups 1 .00 
Error 158} 2.30 
Within Ss 
Mediated meaning 1| 22.58) 9.22* 
Words 1 | 45.75 | 18.67** 
Residual 158} 2.45 
Total 319 
*P < 005 
™~P < 001 


The data of the subgroups were 
therefore combined, and in Table 1 
are presented the means and SD’s 
of the meaning scores for Groups I 
and II. The table represents the 
2 X 2 experimental design. 

Three variables were involved in 
the design: mediated conditioned 
meaning (pleasant and unpleasant) ; 
synonym words (RUG and STONE); 
and Groups (I and II). The scores 
on the semantic differential given 
to each of the two words were ana- 
lyzed in a 2 X 2 latin square (Lind- 
quist, 1953, p. 278). 

The analysis of the data is pre- 
sented in Table 2. The results of the 
analysis indicate that the hypothe- 
sized conditioning and mediation did 
occur. The F for the mediated 
conditioned meaning variable was sig- 
nificant at the .005 level. The differ- 
ence between the marginal means for 
the two synonym words was also 
significant (P < .001), RUG being 
rated more pleasant than STONE. 
The difference between Groups I and 
II was not significant. 


DISCUSSION 


The hypothesis involving the two- 
stage chain of mediating responses be- 
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tween the stimulus and overt response 
was substantiated. The S——RM,- 
SM; RM:-SMz—R chain may be 
considered to be in part a chain of im- 
plicit meaning responses, of which the Ss 
were unaware. With respect to the 
last link in the chain, it is assumed that 
the rating response has previously been 
connected to evaluative mediating re- 
sponses. In addition, the SM;——R 
association may have been strengthened 
as a result of the instructions for the 
use of the semantic differential. 
Bousfield, Cohen, and Whitmarsh 
(1958) have suggested that a group of 
implicit verbal responses which is elicited 
by a word comprises the meaning of the 
word. This definition of meaning is 
consistent with the interpretation of 
meaning presented by Noble (1952). 
Bousfield et al. then state that the 
amount of generalization from one word 
to another is a function of the extent 
to which their word associates overlap. 
The present study could also be inter- 
preted in this manner, i.e., the generali- 
zation of conditioned meaning from 
CARPET to RUG could have occurred 
because of their common word associ- 
ates. Or it could be said that, in the 
measuring phase of the experiment, 
when RUG was presented it elicited 
CARPET, one possible word associate, 
and CARPET elicited the evaluative 
meaning response, and so on. Thus, 
the present study is consistent both 
with the interpretation that meaning 
is an implicit response, as espoused by 
such researchers as Cofer and Foley 
(1942), Mowrer (1954), and Osgood 
(1953), and with the interpretation 
that meaning is a group of word associa- 
tions elicited by a word. Mowrer also 
points out in the example mentioned in 
the introduction that a change in be- 
havior toward Tom resulting from the 
sentence “Tom is a thief’ could occur 
because Tom would elicit the verbal 
response “Tom"’ and this would then 
elicit the meaning of ‘‘thief.””. He con- 





cludes, however, that ‘‘While this hy- 
pothesis is logically tenable and may 
in some cases correspond to reality, 
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it has . . . rather limited applicability” 
(Mowrer, 1954, p. 668). 

It should also be pointed out that the 
present study is relevant to matters 
other than communication, e.g., think- 
ing. Russell and Storms (1955) have 
discussed the fact that implicit verbal 
chains are frequently used to account 
for thinking. They also demonstrated, 
following earlier attempts of Bugelski 
and Scharlock (1952), and Peters (1935) 
to show mediation, that previously 
established word associations could me- 
diate improved paired-associate learning. 
The present study concerns a different 
type of implicit mediating response, and 
suggests that meaning responses may 
be involved in chains of “thinking” 
responses. 

Fewer Ss were aware in this study 
that certain CS words in the condition- 
ing phase were paired with a certain 
type of UCS word than in the original 
language conditioning studies. For ex- 
ample, in the other study using meaning- 
ful words (Staats & Staats, 1958b), 
9 of 72 Ss became aware of at least one 
of the CS-UCS relationships. In the 
studies using nonsense syllables (Staats 
& Staats, 1957) and names (Staats & 
Staats, 1958a) as CS, 9 of 86, and 17 of 93 
Ss, respectively, became aware. (It is 
interesting that more aware Ss resulted 
when names were used as CS.) In the 
present study, only 2 Ss of the 163 
demonstrated this type of awareness. 
It is possible that the reduced proportion 
of aware Ss resulted from the use of 
fewer conditioning trials (14 instead of 
18). Or, this effect may have resulted 
because Ss did not rate the words to which 
evaluative meaning was conditioned, i.e., 
the rating itself may be a determinant 
of awareness in this situation. 


SUMMARY 


In this study two words, ROCK and 
CARPET, were individually presented on a 
screen 14 times, each time paired with the 
auditory presentation of a different word 
(UCS word). While the UCS words were 
different, they all had an identical meaning 
component. For one group, CARPET was 
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paired with words of positive evaluative 
meaning and ROCK was paired with words 
of negative evaluative meaning. For another 
equal-sized group of Ss this was reversed, 
i.e., ROCK was paired with positive evalua- 
tive words and CARPET with negative 
evaluative words. Later, for both groups, 
the evaluative meaning of the synonyms 
of ROCK and CARPET (i.e., STONE and 
RUG) was measured on a semantic differen- 
tial scale. It was found that the synonyms 
had significantly acquired the evaluative 
meaning of the auditorily presented words 
with which the original visually presented 
words had been paired. Thus, evaluative 
word meaning was conditioned to a word (or, 
rather, to the meaning of the word) and the 
conditioned meaning generalized to the syn- 
onym of the word. An interpretation of this 
procedure involved a two-stage chain of im- 
plicit mediating meaning responses between 
the evocation stimulus and the overt response, 
i.e., an s-- -RM,-SM, = RM; SM; - R 
chain. 
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EFFECT OF SCHEDULE AND SEVERITY OF 
PUNISHMENT ON VERBAL BEHAVIOR! 


IRIS C. ROTBERG? 
Johns Hopkins University 


This study is concerned with the 
influence of schedule and severity 
of punishment on human _ verbal 
behavior, both during administration 
and after removal of the punishing 
stimulus. 

Studies of animal behavior which 
provide variables most relevant to 
punishment in human behavior are 
those of Skinner (1938) and Estes 
(1944). Estes examined variables 
which influence the effects of punish- 
ment on the lever-pressing behavior 
of rats. He found that: (a) continu- 
ous punishment produces a greater 
initial depression of response than 
punishment on a partial schedule; 
however, recovery from partial pun- 
ishment is probably somewhat slower ; 
(6) after short periods of mild pun- 
ishment, the total number of elicita- 
tions of a response necessary for 
complete extinction is equal to the 
number of elicitations of the response 
needed when no punishment is given. 
With punishment, there is a tempo- 
rary decrease of response rate when 
punishment is administered, and a 
“compensatory”’ increase when pun- 
ishment is removed. With prolonged 
punishment, or with short periods of 
severe punishment, the total number 
of unreinforced elicitations of the 
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response necessary for extinction may 
be reduced, but the total length of 
time required for extinction will 
remain the same. 

Problems investigated in the pres- 
ent research stem from studies of 
animal behavior. The method em- 
ployed in this experiment, however, 
is derived from those learning studies 
which demonstrate a change in the 
frequency of occurrence of a rein- 
forced verbal response. For example, 
Thorndike (1932, 1949) used the 
verbal response ‘‘Right” for reward 
and “Wrong” for punishment in 
order to determine the effect of 
reinforcement on verbal behavior. 
More recently, other studies (e.g., 
Cohen, Kalish, Thurston, & Cohen, 
1954; Greenspoon, 1955 ; Kanfer, 1954) 
have demonstrated that simple verbal 
responses can be modified by rein- 
forcement. For example, Green- 
spoon (1955) asked Ss to speak 
successive discrete words, and used 
either the verbal stimulus mmn-hmm 
or nonverbal stimulation by tones 
and lights to reinforce all plural 
nouns spoken by 5S. 

The present experiment measured 
the effects of continuous punishment, 
aperiodic punishment, nonpunish- 
ment, and reward upon: (a) number of 
responses emitted and (b) reaction 
time, i.e., the time elapsing between 
the stimulus and S’s response. The 
punishment was either a verbal repri- 
mand (‘mild punishment) or an 
electric shock (“‘severe’’ punishment ) 
The effects of the experimental vari- 
ables were analyzed both for the 
period during which punishment was 
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administered (punishment period) and 
for the period after it was removed 
(extinction period). 


METHOD 
Subjects 


The Ss were 80 male undergraduate 
students from introductory psychology classes 
at Johns Hopkins University. They were 
randomly assigned to eight groups of 10 Ss 
each. 


Word List 


A word association list was used which 
consisted of 20 trials of 20 words each. Ten 
words of each 20 were critical words to which 
antonym responses could easily be given; 
the remaining 10 words were neutral words 
to which it was virtually impossible for 
antonyms to be given. The division into 
trials was for purposes of tabulation and 
randomization; to the Ss, the entire list was 
continuous. The first 100 words of the list 
constituted the training period, and the 
word order was the same for all Ss. There 
were 10 different randomizations of the final 
300 words of the list, each randomization 
used for one S under each experimental 
condition. 

The critical and neutral words were chosen 
on the basis of a preliminary word associa- 
tion test given to 10 additional Ss. Every 
antonym response was rewarded by the 
verbal response “Right.” Critical words 
chosen were those to which antonym responses 
were given by at least 9 of the 10 Ss; neutral 
words were those to which no antonyms were 
given by any of the 10 Ss. 


Apparatus 


Shocks were administered through elec- 
trodes attached to the inner side of S's left 
wrist. The source of the shock came from 
an inductorium and a 4.5-v., DC, source. 
The same setting was used for all Ss. 

Verbal responses were recorded by tape 
recorder, and reaction times were obtained 
from the tape. 


Experimental Design 


The Ss were divided into two groups which 
received either mild or severe punishment. 
The Ss from each level of punishment were 
divided into four groups: (a) continuous 
punishment, (b) aperiodic punishment, (c) 


nonpunishment, and (d) reward. Although 
the nonpunishment and reward groups never 
received punishment, they were divided, for 
control purposes, into mild and _ severe 
categories. These categories represented only 
a difference in directions given (see below). 

There were three experimental periods: 
(a) training period, (b) punishment period, 
and (c) extinction period. 

Training period.—The Ss from all groups 
were rewarded by E’s saying ‘‘Right” follow- 
ing each antonym response for at least two 
trials. The Ss who did not meet the training 
criterion (antonym responses to 9 of the 
last 10 critical words) in two trials were 
administered additional words from the 
training list ; Ss who did not meet the criterion 
by the end of five trials were replaced. 

Punishment period.—The punishment period 
consisted of five trials. The continuous 
punishment groups were punished for all 
antonyms emitted. The aperiodic punish- 
ment groups were punished for 50% of the 
antonyms emitted. The nonpunishment 
groups were neither punished nor rewarded 
for any responses. The reward groups were 
rewarded for all antonym responses. 

Extinction period.—The extinction period 
consisted of 10 trials. The continuous 
punishment, aperiodic punishment, and non- 
punishment groups were neither punished 
nor rewarded for any responses. The reward 
groups were rewarded for all antonym 
responses. 


Procedure 


All Ss were individually tested. They 
were seated at a table, with their backs to E, 
and were informed that the session would be 
taped in order to facilitate the recording of 
their answers. 

The Ss in the mild punishment groups were 
given the following instructions: “I am going 
to read you a list of words. Please answer 
to each word with another word. After 
each of your answers, | will indicate whether 
you are right or wrong. If your answer is 
neither right nor wrong, | won't say anything. 
Please keep a record of the number of times 
you are right and the number of times you are 
wrong. If I don’t say anything, don’t write 
anything down. Try to do as well as you 
can.” 

The Ss in the four severe punishment 
groups received the same instructions, except 
that they were told that shock would be 
administered to indicate wrong answers. 

The Ss were not informed of changes in 
experimental periods. 
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RESULTS 


The data were analyzed by means 
of a mixed design (Lindquist, 1953, 
Type III). The incidence of extreme 
heterogeneity of variance among groups 
when the reward group was included 
necessitated the exclusion of that 
group from the analysis of variance. 
The design made it possible to com- 
pare gross differences and trends with 
respect to the ‘experimental vari- 
ables (continuous punishment, aper- 
iodic punishment, and nonpunish- 
ment), levels of punishment (mild and 
severe), and trials (each including 
10 critical words). 


Number of Antonyms Emitted 


Punishment period.—Figure 1 pre- 
sents the mean number of antonym 
responses per trial for each of the 
eight groups. The data indicate that, 
with the exception of the aperiodic 
mild condition, both mild and severe 
punishment are effective in reducing 
the number of antonyms emitted by 
Ss. Aperiodic punishment is effective 
only when it is severe, while continu- 
ous punishment is effective under 
either mild or severe presentation. 

The results of the analysis of 
variance are presented in Table 1. 
Because Bartlett's test indicated het- 
erogeneity of variance among groups 
(P < .05), it was necessary, for 
pair-wise comparisons, to use the 
error term computed from the data 
involved in the particular test. The 
mean number of antonyms emitted 
under the nonpunishment and aper- 
iodic mild conditions is significantly 
higher than the mean number emitted 
under either the aperiodic severe con- 
dition or the continuous mild or con- 
tinuous severe conditions (P < .025). 
Although the reward groups were not 
included in the analysis of variance, 
it may be noted that the number of 


antonyms emitted by the reward 
groups (pooled) is reliably greater 
than the number emitted by the non- 
punishment groups (P < .001). 
Extinction period.—It will be re- 
called that the only differences among 
the groups in the extinction period 
were the experimental and punish- 
ment conditions that had been ad- 
ministered them during the punish- 
ment period. Figure 1 presents the 
mean number of antonym responses 
per trial for each of the eight groups. 
The data indicate that the rank 
ordering of the groups according to 
number of antonyms emitted does not 
change after punishment is removed. 
The results of the analysis of 
variance are presented in Table 1. 
Bartlett's test indicated heterogeneity 
of variance among groups (P < .10). 
For this reason, the simple tests of 
significance were computed in the 
manner previously described. The 
mean numbers of antonyms emitted 
under the nonpunishment and aper- 
iodic mild conditions are significantly 
higher than the mean numbers emitted 
under either the continuous mild 
or the continuous severe conditions 
(P < .005). The difference between 
the aperiodic mild and _ aperiodic 
severe groups approaches the .05 
significance level (P < .055). The 
number of antonyms emitted by the 
reward groups (pooled) is_ reliably 
greater than the number emitted by 
the nonpunishment groups (P < .001). 
The following analyses were also 
employed to determine the extinction 
of the effects of punishment: (a) 
Tests for the presence of a linear 
trend within the aperiodic severe, 
continuous mild, and continuous se- 
vere groups; these tests proved non- 
significant. (b) Comparison between 
the number of antonyms emitted in 
the first five extinction trials and 
the number emitted in the last five ex- 
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Fic. 1. Mean number of antonym responses emitted per trial by each group 
during the punishment and extinction periods. 
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tinction trials in each of those groups; 
the difference is reliable only in the 
aperiodic severe group (P < .005). 
(c) Tests comparing the number of 
antonyms emitted in Trial 10 in each 
of these groups with the number 
emitted in the nonpunishment (se- 
vere directions) group; the differ- 
ence is significant for the continuous 
mild and continuous severe groups 
(P < .005). 


Reaction Times 


Punishment period.—Figure 2 pre- 
sents the mean reaction time per 
word, in seconds, for the continuous 
and aperiodic groups for each of the 
trials. The data indicate that reac- 
tion times are higher under aperiodic 
and continuous severe punishment 


than they are under aperiodic and 
continuous mild punishment. Reac- 
tion times are not affected by schedule 
of punishment. 

The results of the analysis of 
variance are presented in Table 2. 
Pair-wise comparison indicated that 
reaction times for the pooled data 
of the continuous and aperiodic mild 
groups are reliably lower than those 
for the continuous and aperiodic 
severe groups (P < .05). The re- 
ward and nonpunishment groups, 
which are not represented on the 
graph, are below, but not reliably 
below, the continuous and aperiodic 
mild groups. Bartlett's test indi- 
cated heterogeneity of variance among 
groups (P < .01). 

Extinction period.—Figure 2 pre- 
sents the mean reaction time per 
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. 2. Mean reaction time per word on each trial during the punishment and 
extinction periods for the groups receiving punishment 
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TABLE 2 


ANALYSES OF VARIANCE OF REACTION Time DaTA FOR PUNISHMENT 
AND EXTINCTION PERIODS 
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word, in seconds, for the continuous 
and aperiodic groups, for each of the 
10 trials. The data indicate that 
the rank ordering of the groups ac- 
cording to length, of reaction time 
does not change after punishment is 
removed. 

The results of the analysis of vari- 
ance are presented in Table 2. Pair- 
wise comparison indicated that reac- 
tion times of the continuous and 
aperiodic mild groups pooled are 
lower than those of the continuous 
and aperiodic severe groups. This 
difference approaches the .05  sig- 
nificance level (P < .06). The re- 
ward and nonpunishment groups, 
which are not represented on the 
graph, are below, but not reliably 
below, the continuous and aperiodic 
mild groups. Bartlett’s test indi- 
cated heterogeneity of variance among 
groups (P < .05). 


DISCUSSION 


The constructs best defining the 
response and reaction time measures, 
as well as the relationship between these 
measures, can be formulated by delineat- 
ing for each the conditioning methods 
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employed and the relevant experimental 
variables. The verbal response frequen- 
cies were reduced by an instrumental 
conditioning procedure; i.e., avoidance 
of aversive stimulation was made con- 
tingent upon the avoidance of a particu- 
lar response (antonym). The decrease 
in frequency of punished responses, which 
was related to both schedule and severity 
of punishment, can be conceptualized 
as avoidance learning. Lengthened re- 
action times, which were related only 
to severity of punishment, can be de- 
scribed as one measure of a generalized 
conditioned ‘‘anxiety”’ state. This anx- 
iety state was presumably conditioned 
by a classical procedure; i.e., avoidance 
of aversive stimulation was not contin- 
gent upon the avoidance of a particular 
length of reaction time. 

However, decrease in frequency of 
punished responses can be described not 
only as avoidance learning, but also, 
like lengthened reaction times, as a meas- 
ure of a generalized conditioned anxiety 
state. For example, Estes (1944) ex- 
plains his results by postulating anxiety 
as suppressing the responses of his Ss. 
Since the response frequencies in a num- 
ber of his experiments were lowered 
through instrumental conditioning, it 
might seem sufficient to refer to the 
observed decrease of punished responses 
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as avoidance learning. However, Estes 
also investigated the effect upon rate of 
responding of an aversive stimulus which 
was not paired with the response. He 
cites, for example, an experiment (Estes 
& Skinner, 1941) employing classical con- 
ditioning which demonstrated that a stim- 
ulus previously associated with punish- 
ment depressed ongoing responses although 
neither the punishment nor the stimulus 
had ever been correlated with the par- 
ticular responses blocked. Thus, a de- 
crease in a response can be viewed both 
as emotional suppression under classical 
conditioning and as avoidance learn- 
ing (presumably confounded with emo- 
tional suppression) under instrumental 
conditioning. 

The present experiment indicates that 
response frequencies are affected by 
schedule of aversive stimulation, while 
reaction times are not. The correlation 
between aversive stimulus and antonym 
response is greater under continuous 
reinforcement than under partial rein- 
forcement. Therefore, Ss learn to avoid 
more easily under a continuous schedule 
of punishment. On the other hand, a 
generalized conditioned anxiety state 
(inducing long reaction times) occurs 
irrespective of the correlation between 
specific responses and aversive stimu- 
lation. Perhaps, therefore, in a situa- 
tion where aversive stimulation was not 
correlated with specific responses, re- 
sponse frequencies would behave more 
like reaction times. 

In addition to the foregoing theoretical 
implications, the data can be compared 
with results of other studies. Estes 
(1944) concluded that punishment sup- 
presses a response temporarily and that 
removal of punishment is always followed 
by some recovery in strength of response. 
The present study indicates that the 
aperiodic severe, continuous mild, and 
continuous severe groups remained below 
the nonpunishment groups in_ total 
number of antonyms emitted during 
extinction. There was, however, evi- 
dence of recovery of responses in the 
aperiodic severe group. Two procedures 
in Estes’ experiments which differed 
from the present study should be noted: 
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(a) extinction periods covered several 
days; (b) duration of the effects of 
punishment was determined by a com- 
parison of punishment and nonpunish- 
ment conditions with respect to the 
number of responses necessary or the 
length of time necessary to extinguish 
a previously rewarded response. In 
Estes’ study, the response could be 
completely extinguished; in the present 
study, because the response had con- 
siderable pre-experimental strength, the 
lasting effects of punishment could be 
determined only by comparing the 
punishment and nonpunishment groups 
during the extinction period. 

The majority of previous studies have 
been consistent in supporting the notion 
that partial reinforcement produces more 
resistance to extinction than continuous 
reinforcement (Jenkins & Stanley, 1950). 
One explanation for the relatively per- 
sistent effects of continuous punishment 
in the present study comes from Hull's 
(1941) hypothesis that extinction is 


prolonged when the situation during the 
extinction period is 


similar to that 
during the conditioning period. He 
explained the greater resistance to ex- 
tinction of partial reinforcement than 
of continuous reinforcement by noting 
that, under partial reinforcement, cues 
(aftereffects of nonreinforcement) pres- 
ent in extinction are more similar to 
those available during conditioning than 
are the cues under continuous reinforce- 
ment. This hypothesis suggests that the 
presence of neutral words in the present 
study contributed to poor discriminability 
between conditions of punishment and of 
nonpunishment. The Ss in both the con- 
tinuous and the aperiodic groups were not 
punished for responses to at least 50% 
of the stimulus words (‘‘neutral’’ words). 
It is possible, therefore, that the insertion 
of these words produced a situation in 
which the extinction period for both 
groups contained cues similar to those 
of the punishment period, i.e., cues 
produced by the nonpunishment of 
verbal responses, even though these 
verbal responses were not antonyms. 
When the Ss were no longer punished 
for any responses, the change was not 
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so abrupt for the continuous punishment 
group as it would have been if there had 
been no neutral words. Thus, if the 
neutral words had been excluded, the 
continuous punishment groups might 
have extinguished more rapidly than 
they did. 


SUMMARY 


The experiment measured the effects of 
continuous punishment, aperiodic punish- 
ment, nonpunishment, and reward. A word 
association list was used as stimulus material, 
and the experimental procedures were im- 
posed upon antonym responses. Punish- 
ment was administered either through a 
verbal reprimand (“mild” punishment) or an 
electric shock (‘‘severe"’ punishment). The 
effect of the experimental variables was 
analyzed both for the period during which 
punishment was administered (punishment 
period) and for the period after it was re- 
moved (extinction period). Two behavioral 
measures were employed: (a) number of 
responses emitted and (b) reaction time, the 
time elapsing between the stimulus and the 
S's response. 

The following results were obtained: 

1. During both punishment and extinction 


periods, Ss emitted fewer antonym responses 
under the aperiodic severe, continuous mild, 


and continuous severe punishment pro- 
cedures than they did under the aperiodic 
mild punishment procedure or under the 
nonpunishment or reward procedures. 

2. During both periods, Ss’ reaction times 
were higher under continuous and aperiodic 
severe punishment conditions than under 
continuous and aperiodic mild conditions. 

The response and reaction time measures 
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were interpreted in terms of avoidance learn- 
ing and conditioned “‘anxiety”’ constructs. 
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